Scherzer Lab wiki

[bioinformatics section]

Table of Contents:

  1. Bioinformatics lab rules
  2. Lab resources
  3. Linux/Unix basic
  4. Bioinformatics Tools
  5. Bioinformatics resources
  6. High-throughput computing resources
  7. Partners HPCC overview
  8. NGS analysis pipeline
  9. NGS resources

  1. Bioinformatics lab rules

  1. use zipped format (*.gz) as possible as you can.
  2. use binary format (*.bam, *.bw) instead of raw format such as *.sam, *bedGraph;
  3. move raw sequencing files (*.fastq) to backup disk after it’s processed.
  4. remove intermediate or redundant files
  5. use soft link (e.g. ln -s) instead of using hard copy (e.g. cp)
  6. Xianjun’s top 15 practical tips:
  1. Lab resources

  1. Web Server:
  1. For new user: email your hpcc user ID to Xianjun ( for access
  2. ssh
  3. mkdir public_html
  4. from your local or eris, type: scp my.file
  5. chmod 644 my.file
  6. You should see your file by the address:
  1. hpcc cluster: (get account first, see below)
  2. Track hub: (ask Xianjun for pass)
  3. Scherzer lab track session on UCSC:
  1. Enhancer:
  2. S_otherUserName=sterding&hgS_otherUserSessionName=hg19_PD_enhancerall RNAseq:
  3. GTEx:
  1. Group reference:
  1. Linux/Unix basic

  2. (advanced one)
  1. Bioinformatics Tools

  1. bedtools:
  2. samtools:
  3. UCSC Jim Kent utility:
  1. all commands:
  2. source:
  3. installation instruction:;a=blob;f=src/userApps/README
  1. ENCODE tools:
  2. RNAseq analysis: Bowtie/Tophat/Cufflinks etc.:
  3. GATK:
  4. ggplot2:
  1. Bioinformatics resources

  1. UCSC Genome Browser
  1. About:
  2. Training tutorial 1:
  3. Training tutorial 2:
  1. Rdocumentation:
  2. BioStars forum:
  3. R news and tutorials RSS:
  1. High-throughput computing resources

  1. Eris (Partners):
  2. Orchestra (HMS):
  3. Odessey (Harvard):
  4. XSEDE’s Blacklight server (Pittsburgh Supercomputer Center):
  5. The Data Intensive Acadmeic Grid (DIAG):
  1. Partners HPCC overview

  1. First-time users guide:
  2. Register an account here:
  3. Login in:
  1. ssh
  2. ssh
  3. ssh
  1. Partner’s hpcc is based on LSF:
  2. Knowledge base:
  3. How to submit jobs:
  4. How to change queues/resources:
  5. How to log in specific node:
  6. how to mount the Eris folder:
  7. Need help? send email ( or ticket (
  8. VPN: use in your Cisco Anyconnect
  1. NGS analysis pipeline

  1. RNA-seq:
  1. on github:
  2. on hpcc: ~/neurogen/pipeline/RNAseq  
  3. NOTE: Please use git to manage the change if you edit it there directly; otherwise, I prefer you to clone a version in your home and push to github every time you made a change. This applies to all projects.
  1. smallRNA-seq
  2. genotyping
  3. DNA-seq
  1. NGS resources

  1. ENCODE:
  2. modENCODE:
  3. mouseENCODE:
  4. Roadmap Epigenomics:
  5. FANTOM:
  6. GTEx:
  1. How to manage different R version in Mac, Eris cluster

We want to use the same version of R in Mac and the Linux-based ERIS cluster. Here is how to:

In Mac, simply download and install R. Its default install location will be


As you can tell from

[xdong@macbook ~]$ ls -l /usr/local/bin/R

lrwxr-xr-x 1 root admin 47 Nov  4 19:27 /usr/local/bin/R -> /Library/Frameworks/R.framework/Resources/bin/R

The different versions of R are managed as

[xdong@macbook ~]$ ll /Library/Frameworks/R.framework/Versions/

total 4.0K

drwxrwxr-x 6 root 204 Nov  8  2013 3.0

drwxrwxr-x 3 root 102 Jun  9 20:08 3.1

drwxrwxr-x 6 root 204 Nov  4 19:26 3.3

lrwxr-xr-x 1 root   3 Nov  4 19:26 Current -> 3.3

The default path of R libraries installed via R command console or Rstudio will be


This can be seen in R via:

> .libPaths()

[1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"

If you don’t like to install there, you can change it by setting the following line in ~/.Rprofile file

.libPaths( "/My/path/for/Rlib" )


This can also be override by setting the R_LIBS_USER variable. (not tested yet)

In Linux server, this is similar. As ERIS cluster manager also installed different R version for you. What you need to do is just to load the R module:

$ module ava

$ module load R/3.3.0

Then you can use R v.3.30 directly, and some library the manager installed is already accessible by $R_HOME. If you install your own library in R console, it will ask you if you want to create a personal folder (if not yet)