Scherzer Lab wiki
[bioinformatics section]
Table of Contents:
- Bioinformatics lab rules
- Lab resources
- Linux/Unix basic
- Bioinformatics Tools
- Bioinformatics resources
- High-throughput computing resources
- Partners HPCC overview
- NGS analysis pipeline
- NGS resources
Bioinformatics lab rules
- use zipped format (*.gz) as possible as you can.
- use binary format (*.bam, *.bw) instead of raw format such as *.sam, *bedGraph;
- move raw sequencing files (*.fastq) to backup disk after it’s processed.
- remove intermediate or redundant files
- use soft link (e.g. ln -s) instead of using hard copy (e.g. cp)
- Xianjun’s top 15 practical tips: http://onetipperday.sterding.com/2016/02/my-15-practical-tips-for.html
Lab resources
- Web Server: panda.dipr.partners.org
- For new user: email your hpcc user ID to Xianjun (xdong@rics.bwh.harvard.edu) for access
- ssh YOUR_PARTNER_ID@panda.dipr.partners.org
- mkdir public_html
- from your local or eris, type: scp my.file YOUR_PARTNER_ID@panda.dipr.partners.org:~/public_html
- chmod 644 my.file
- You should see your file by the address: http://panda.partners.org/~YOUR_PARTNER_ID
- hpcc cluster: eris1n2.research.partners.org (get account first, see below)
- Track hub: http://panda.partners.org/~xd010/myHub/hub.txt (ask Xianjun for pass)
- Scherzer lab track session on UCSC:
- Enhancer: https://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hg
- S_otherUserName=sterding&hgS_otherUserSessionName=hg19_PD_enhancerall RNAseq: https://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=sterding&hgS_otherUserSessionName=hg19_PD
- GTEx: https://genome.ucsc.edu/cgi-bin/hgTracks?hgS_doOtherUser=submit&hgS_otherUserName=sterding&hgS_otherUserSessionName=hg19_GTEx
- Group reference: http://www.mendeley.com/groups/4633051/neurogenomics/
Linux/Unix basic
- http://www.ee.surrey.ac.uk/Teaching/Unix/
- http://www.tldp.org/LDP/gs/node5.html (advanced one)
- https://www.bits.vib.be/index.php/training/124-linux-for-bioinformatics
Bioinformatics Tools
- bedtools: http://bedtools.readthedocs.org/
- samtools: http://samtools.sourceforge.net/
- UCSC Jim Kent utility:
- all commands: http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64
- source: https://github.com/ENCODE-DCC/kentUtils
- installation instruction: http://genome-source.cse.ucsc.edu/gitweb/?p=kent.git;a=blob;f=src/userApps/README
- ENCODE tools: https://github.com/ENCODE-DCC/
- RNAseq analysis: Bowtie/Tophat/Cufflinks etc.: http://ccb.jhu.edu/software.shtml
- GATK: https://www.broadinstitute.org/gatk/
- ggplot2: http://ggplot2.org/
Bioinformatics resources
- UCSC Genome Browser
- About: http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html
- Training tutorial 1: http://www.openhelix.com/ucsc
- Training tutorial 2: http://bit.ly/genomebrowserYoutube
- Rdocumentation: http://www.rdocumentation.org/
- BioStars forum: https://www.biostars.org/
- R news and tutorials RSS: http://www.r-bloggers.com/
High-throughput computing resources
- Eris (Partners): http://rc.partners.org/hpc
- Orchestra (HMS): https://wiki.med.harvard.edu/Orchestra/
- Odessey (Harvard): https://rc.fas.harvard.edu/odyssey-quickstart-guide/
- XSEDE’s Blacklight server (Pittsburgh Supercomputer Center): https://www.xsede.org/high-performance-computing
- The Data Intensive Acadmeic Grid (DIAG): http://diagcomputing.org/
Partners HPCC overview
- First-time users guide: http://rc.partners.org/node/126
- Register an account here: https://rc.partners.org/eris_cluster
- Login in:
- ssh YOUR_PARTNER_ID@eris1n2.research.partners.org
- ssh YOUR_PARTNER_ID@eris1n3.research.partners.org
- ssh YOUR_PARTNER_ID@erisone.partners.org
- Partner’s hpcc is based on LSF: http://en.wikipedia.org/wiki/Platform_LSF
- Knowledge base: http://rc.partners.org/kbase/High_Performance_Computing
- How to submit jobs: http://rc.partners.org/node/227
- How to change queues/resources: http://rc.partners.org/kbase?cat_id=45&art_id=425
- How to log in specific node: http://rc.partners.org/kbase?cat_id=45&art_id=405
- how to mount the Eris folder: http://rc.partners.org/kbase?cat_id=47&art_id=312
- Need help? send email (hpcsupport@partners.org) or ticket (https://tickets.partners.org/)
- VPN: use pvc.partners.org/legacy in your Cisco Anyconnect
NGS analysis pipeline
- RNA-seq:
- on github: https://github.com/sterding/RNAseq
- on hpcc: ~/neurogen/pipeline/RNAseq
- NOTE: Please use git to manage the change if you edit it there directly; otherwise, I prefer you to clone a version in your home and push to github every time you made a change. This applies to all projects.
- smallRNA-seq
- genotyping
- DNA-seq
NGS resources
- ENCODE: https://www.encodeproject.org/
- modENCODE: http://www.modencode.org
- mouseENCODE: http://mouseencode.org
- Roadmap Epigenomics: http://www.roadmapepigenomics.org/
- FANTOM: http://fantom.gsc.riken.jp/5
- GTEx: http://commonfund.nih.gov/GTEx/index
- How to manage different R version in Mac, Eris cluster
We want to use the same version of R in Mac and the Linux-based ERIS cluster. Here is how to:
In Mac, simply download and install R. Its default install location will be
/Library/Frameworks/R.framework/Resources/bin/R
As you can tell from
[xdong@macbook ~]$ ls -l /usr/local/bin/R
lrwxr-xr-x 1 root admin 47 Nov 4 19:27 /usr/local/bin/R -> /Library/Frameworks/R.framework/Resources/bin/R
The different versions of R are managed as
[xdong@macbook ~]$ ll /Library/Frameworks/R.framework/Versions/
total 4.0K
drwxrwxr-x 6 root 204 Nov 8 2013 3.0
drwxrwxr-x 3 root 102 Jun 9 20:08 3.1
drwxrwxr-x 6 root 204 Nov 4 19:26 3.3
lrwxr-xr-x 1 root 3 Nov 4 19:26 Current -> 3.3
The default path of R libraries installed via R command console or Rstudio will be
/Library/Frameworks/R.framework/Resources/library/
This can be seen in R via:
> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"
If you don’t like to install there, you can change it by setting the following line in ~/.Rprofile file
.libPaths( "/My/path/for/Rlib" )
See: http://stackoverflow.com/questions/2615128/where-does-r-store-packages
This can also be override by setting the R_LIBS_USER variable. (not tested yet)
In Linux server, this is similar. As ERIS cluster manager also installed different R version for you. What you need to do is just to load the R module:
$ module ava
$ module load R/3.3.0
Then you can use R v.3.30 directly, and some library the manager installed is already accessible by $R_HOME. If you install your own library in R console, it will ask you if you want to create a personal folder (if not yet)