Applied Genomics
BIOL-GA.1130
This course provides a comprehensive introduction to the analysis of next generation DNA sequence (NGS) data. Through a combination of lectures, hands-on computational training, discussions of scientific papers, and assignments using real data, students will learn the foundations of analytical methods, the computational skills to implement those methods, and the reasoning skills to critically assess the primary literature in genomics. The course will cover all commonly used NGS methods including genome sequence analysis, gene expression analysis and protein-nucleic acid interactions. To gain practical expertise in executing bioinformatic analyses, students will undertake a series of assignments using real data. Students will also complete an individual project that integrates skills and concepts covered during the class and that is tailored to meet their background and training.
The course is designed for students with a background in biology who have some experience with statistical analyses using the R programming language. The course is also appropriate for computer science students with some biology background who wish to improve their skills in translating biological problems into computational approaches. The course is based on the premise that biological and computational research is now inextricably intertwined.
The goal of the course is to provide students with the necessary skills to move seamlessly from acquisition of genomic data to its analysis using UNIX commands and programming in R. Students will read primary research from the genomics and bioinformatics literature and work with real large-scale datasets. The course will teach students to synthesize data from the literature and devise novel computational experiments to test new ideas or hypotheses.
For the final project, students will be required to read the primary literature, identify a biological problem and the available datasets to work with, and undertake a computational analysis to tackle the problem. Students will generate a written report of their study and present their projects to instructors and fellow students.
Prerequisites: Statistics in Biology, or equivalent background in statistics and R programming with instructor permission.
Grading Scheme
Syllabus
Instructors: David Gresham and Manpreet Katari
Teaching Assistant: Dayanne Castro
Course Number: BIOL-GA.1130
Credits: 4
Lecture/Computational lab: Wednesday 12:30pm-3:15pm
Recitation: Monday 12:30-1:30 p.m
Location of Lecture and Recitation: 12 Waverly Place (CGSB) Room L111
Each lecture will be lead by either David Gresham (DG) or Manpreet Katari (MK)
1/27/2016 Week 1: Unix and Accessing NYU’s HPC facility (Shenglong)
2/3/2016 Week 2: Introduction to next generation sequencing (MK, DG)
Reading Assignment 1: None
Lecture 1: What is next generation sequencing, file formats, R/Bioconductor, Unix, HPC
Computer Lab 1: Access HPC, execute unix and R commands, PBS scripts, visualization, getting data from databases
Assignment 1: Getting started with HPC, R and NGS file types.
2/10/2016 Week 3: Genome Alignment (MK)
Reading Assignment 2:
Lecture 2: How to align a genome using short read sequences
Computer Lab 2: FASTQC, Bowtie and BWA examples
Assignment 2: Align sequenced genome using Bowtie and BWA
2/17/2016 Week 4: Programming with R and reproducible research using R Markdown (Brian Parker and Dayanne Castro)
Assignment 3: Perform some R analyses and generate an html file from .RMD
2/24/2016 Week 5: Detecting variants with next generation sequencing (DG)
Reading Assignment 3: “An integrated map of genetic variation from 1092 human genomes”, Nature. (2012)
Lecture 3: Identifying SNPs, CNVs, translocations, low abundance mutations in NGS data
Computer Lab 3: samtools, bcftools, GATK
Assignment 4: Paired End alignment, VCF generation and CNV detection
3/2/2016 Week 6: RNA-seq I: Alignment and Quantification (MK)
Reading Assignment 4: Mortavzi et al.,
Lecture 4: Mapping RNA-seq reads, quantification, splice variants, RNA editing
Computer Lab 4: Tophat, Cufflinks
Assignment 5: Gene expression analysis using RNA-seq with TopHat
3/9/2016 Week 7: RNA-seq II: Analysis (DG)
Reading Assignment 5: Rapaport et el.,
Lecture 5: Differential gene expression analysis, hierarchical clustering, GO term enrichment
Computer Lab 5: edgeR, cluster
Assignment 6: Differential gene expression analysis, clustering, and PCA
3/23/2016 Week 8: ChIP-seq (DG)
Reading Assignment 6: Boyle et al.,
Lecture 6: Design and Analysis of ChiP seq experiments
Computer Lab 6: MACS, detecting peaks, PWM
Assignment 7:
3/30/2016 Week 9: De-novo genome assembly (MK)
Reading Assignment 8: TBD
Lecture 8: How to assemble a genome sequence without a reference
Computer Lab 8: SOAP-de novo, Velvet
Assignment 8:
4/6/2016 Week 10: Project proposals
4/13/2016 Week 11: Meta-genomics (MK)
Reading Assignment 9: TBD
Lecture 9: Amplicon-based, reference bases, de novo
Computer Lab 9:
4/20/2016 Week 12: Network analysis (DG)
Reading Assignment 11: TBD
Lecture 12: Generating and analyzing networks of interactions
Computer Lab 12: Cytoscape and iGraph
4/27/2016 Week 13: Project Presentations I
5/4/2016 Week 14: Project Presentations II