ABCDEFGHIJKLMNOPQRS
1
SORTORGANISMDATA TYPE
DATA LEVEL
PROCESSING STEPDATA FORMATSDATA LOCATIONNOTESDATA/PROCESSING CONTACT
DATA CONTACT PI
DATE
2
1Worm (WS220)
Genome Sequence
0Genome SequenceFASTAhttp://hgdownload.cse.ucsc.edu/goldenPath/ce10/chromosomes/Per chromosome FASTA file
3
2Worm (WS220)ChIP-seq1Raw AlignmentsSAMhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/alignments/bam/Uniformly mapped data using BWA (only unique mapping reads)
Carlos Araya (claraya@gmail.com) ; Philip Cayting (pcayting@stanford.edu)
Mike Snyder
4
Worm (WS220)ChIP-seqFiltered AlignmentsTAGALIGN/BED
http://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/alignments/distinctTagAlign/reps/
Unique mapping reads, duplicates filtered
Carlos Araya (claraya@gmail.com) ; Anshul Kundaje (anshul@kundaje.net)
5
3Worm (WS220)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
https://docs.google.com/spreadsheet/ccc?key=0Algk3BSZDYzgdDlYNU00d2p3azJyZWlrZ09OQXNXTGc#gid=0
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statistics.Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
6
19Worm (WS220)Blacklist4BlacklistsBEDhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/blacklist/ce10-blacklist.bed.gzEmpirical blacklist of regions with artifactual unstructured signalAlan Boyle (aboyle@stanford.edu)Mike Snyder
7
4Worm (WS220)ChIP-seq3IDR Peak CallsNARROWPEAKhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/peakCalls/idr/The SPP peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility.
IDR threshold of 0.05 was used.
chrM peaks were removed as these were unreliable in most cases.
See https://sites.google.com/site/anshulkundaje/projects/idr for details
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
8
4Worm (WS220)ChIP-seq3Blacklist filtered IDR peak Calls (Use these)NARROWPEAKhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/peakCalls/uniformPk/IDR Peak calls are filtered against blacklistsAlan Boyle (aboyle@stanford.edu)Mike Snyder
9
5Worm (WS220)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAKhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/peakCalls/unthresholdPk/These are a large set of unthresholded peak calls (upto ~100K peaks) from SPP. Useful for analyses that want to analyze low signal peaks.
Carlos Araya (claraya@gmail.com) ; Philip Cayting (pcayting@stanford.edu)
Mike Snyder
10
6Worm (WS220)Mappability7Unique Mappability track
(Read Lengths 20 to 54)
BINARY
http://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/mappability/globalmap_k20tok54.tgz
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

===============================================
How to read the files in a programming language such as matlab/octave
===============================================
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');
fclose(tmp_uMap);

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
11
22Worm (WS220)ChIP-seq6Signal tracks (Input normalized)BIGWIG/BEDGRAPHhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/signal/foldChange/
Signal tracks are generated for each dataset using MACSv2's signal processing module. Signal tracks represent ChIP signal compared to input control signal.
- FoldEnrichment: The first type of track represents the fold enrichment of ChIP over input. This type of track is useful for detecting and analyzing regions with moderate to low enrichment.
- Does NOT correct for local mappability
- Does NOT differentiate between "missing data" at unmappable locations and true 0 signal.
- The signal is not smoothed. Reads are extended to predominant fragment length.
- It is recommended to smooth the signal before using it unless you are averaging over multiple sites (.e.g aggregation plots)
- Separate tracks are generated for individual replicates and pooled data.
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
12
Worm (WS220)TIPTF target predictionTIPhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/TFtargets/See README file in the directory; Chao Cheng's TIP algorithm for predicting TF target genes was applied to the input-normalized ChIP-seq tracks; these are the output files of that method. Note that TIP was run on all CHIP-seq datasets, including those with score -1. For most applications you should ignore those results, and treat the score=0 results cautiously.
Chao Cheng (Chao Cheng <chengchao12@gmail.com>)
13
Worm (WS220)HOTHOT RegionsBEDhttp://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/hotRegions/See README at http://encodedcc.sdsc.edu/ftp/modENCODE_VS_ENCODE/Regulation/Worm/hotRegions/readme.txt
Carlos Araya (claraya@gmail.com) ; Alan Boyle (aboyle@stanford.edu)
Mike Snyder
14
15
7Fly (FB5.45)
Genome Sequence
0Genome SequenceFASTAhttp://hgdownload.cse.ucsc.edu/goldenPath/dm3/chromosomes/Per chromosome FASTA file
16
8Fly (FB5.45)ChIP-seq1Raw data (Alignments)SAMhttp://www.broadinstitute.org/~anshul/projects/fly/mapped/sam/Uniformly mapped data using BWA (only unique mapping reads)Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
17
Fly (FB5.45)ChIP-seqFiltered AlignmentsTAGALIGN/BEDhttp://www.broadinstitute.org/~anshul/projects/fly/mapped/distinctTagAlign/qcmerge/Unique mapping reads, duplicates filtered
18
9Fly (FB5.45)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
https://docs.google.com/spreadsheet/ccc?key=0Algk3BSZDYzgdDU3cXVVMHdQeHRTUWtnYk1aSG13NEE&pli=1#gid=4
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statisticsAnshul Kundaje (anshul@kundaje.net)Manolis Kellis
19
19Fly (FB5.45)Blacklist4BlacklistsBEDhttp://www.broadinstitute.org/~anshul/projects/fly/blacklist/dm3-blacklist.bed.gzEmpirical blacklist of regions with artifactual unstructured signalAlan Boyle (aboyle@stanford.edu)Mike Snyder
20
10Fly (FB5.45)ChIP-seq3IDR Peak CallsNARROWPEAKhttp://www.broadinstitute.org/~anshul/projects/fly/peaks_macs/release/idrOptimal/pass/The MACSv2 peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility.
IDR threshold of 0.05 was used.
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
21
10Fly (FB5.45)ChIP-seq3Blacklist filtered IDR peak CallsNARROWPEAKhttp://www.broadinstitute.org/~anshul/projects/fly/peaks_macs/release/idrOptimalBlacklistFiltered/IDR Peak calls are filtered against blacklistsAlan Boyle (aboyle@stanford.edu)Mike Snyder
22
11Fly (FB5.45)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAKhttp://www.broadinstitute.org/~anshul/projects/fly/peaks_macs/release/combrep/regionPeak/These are a large set of unthresholded peak calls using MACSv2. Useful for analyses that want to analyze low signal peaks.Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
23
12Fly (FB5.45)Mappability7Unique Mappability track
(Read Lengths 20 to 54)
BINARY
http://www.broadinstitute.org/~anshul/projects/encode/rawdata/umap/dm3_build5/dm3_build5.all.globalmap_k20tok54.tgz
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

===============================================
How to read the files in a programming language such as matlab/octave
===============================================
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');
fclose(tmp_uMap);

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
24
22Fly (FB5.45)ChIP-seq6Signal tracks (Input normalized)BIGWIG/BEDGRAPHhttp://www.broadinstitute.org/~anshul/projects/fly/uniformSignal/macs2signal/combrep/
Signal tracks are generated for each dataset using MACSv2's signal processing module. Signal tracks represent ChIP signal compared to input control signal.
- FoldEnrichment: The first type of track represents the fold enrichment of ChIP over input. This type of track is useful for detecting and analyzing regions with moderate to low enrichment.
- Does NOT correct for local mappability
- Does NOT differentiate between "missing data" at unmappable locations and true 0 signal.
- The signal is not smoothed. Reads are extended to predominant fragment length.
- It is recommended to smooth the signal before using it unless you are averaging over multiple sites (.e.g aggregation plots)
- Separate tracks are generated for individual replicates and pooled data.
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
25
Fly (FB5.45)TIPTF target predictionTIPhttp://www.broadinstitute.org/~anshul/projects/fly/TFtargets/jun2012/See README.fly file in the directory; Chao Cheng's TIP algorithm for predicting TF target genes was applied to the input-normalized ChIP-seq tracks; these are the output files of that method. Note that TIP was run on all CHIP-seq datasets, including those with score -1. For most applications you should ignore those results, and treat the score=0 results cautiously.
Chao Cheng (Chao Cheng <chengchao12@gmail.com>)
26
Fly (FB5.45)HOTHOT RegionsBEDhttp://stanford.edu/~claraya/metrn/data/hot/regions/dm/See README: http://stanford.edu/~claraya/metrn/data/hot/
Carlos Araya (claraya@gmail.com) ; Alan Boyle (aboyle@stanford.edu)
Mike Snyder
27
28
7Human (hg19)
Genome Sequence
0Genome SequenceFASTAhttp://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/referenceSequences/Per chromosome FASTA file (Random contigs are not used for mapping or computing unique mappability)
29
15Human (hg19)ChIP-seq1Raw data (Alignments)BAMhttp://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC
FASTQ and BAM files can be downloaded from the URL. Different labs used different mappers and mapping strategies. Hence, these files should be filtered to standardize them.
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
30
16Human (hg19)ChIP-seq1Raw data (Unique mapping distinct alignments)TAGALIGNhttp://www.broadinstitute.org/~anshul/projects/encode/rawdata/mapped/mar2012/distinctTagAlign/BAM files above are filtered to only keep unique mapping reads (tagAlign/ directory).
Then duplicate reads were removed (only one read per position). These can be obtained in the distinctTagAlign/ directory
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
31
17Human (hg19)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
https://docs.google.com/spreadsheet/ccc?key=0Am6FxqAtrFDwdHdRcHNQUy03SjBoSVMxdUNyZV9Rdnc#gid=9
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statisticsAnshul Kundaje (anshul@kundaje.net)Manolis Kellis
32
19Human (hg19)Blacklist4BlacklistsBED
http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeMapability/wgEncodeDacMapabilityConsensusExcludable.bed.gz
Brief summary of how the blacklist was generated can be found at
http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability .
More detailed analysis is at http://goo.gl/9FyQF
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
33
18Human (hg19)ChIP-seq3IDR Peak CallsNARROWPEAKhttp://www.broadinstitute.org/~anshul/projects/encode/rawdata/peaks_spp/mar2012/distinct/idrOptimal/The SPP peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility. IDR threshold of 0.02 was used.
See https://sites.google.com/site/anshulkundaje/projects/idr for details
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
34
20Human (hg19)ChIP-seq4Blacklist filtered IDR peak Calls (Use these)NARROWPEAK
http://www.broadinstitute.org/~anshul/projects/encode/rawdata/peaks_spp/mar2012/distinct/idrOptimalBlackListFilt/
IDR Peak calls are filtered against blacklists. THESE ARE THE HUMAN PEAK CALLS EVERYONE SHOULD USE.Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
35
21Human (hg19)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAK
http://www.broadinstitute.org/~anshul/projects/encode/rawdata/peaks_spp/mar2012/distinct/combrep/regionPeak/
These are a large set of unthresholded peak calls (upto 300K peaks) from SPP. Useful for analyses that want to analyze low signal peaks.Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
36
22Human (hg19)ChIP-seq6Signal tracks (Input normalized)BIGWIG/BEDGRAPH
http://www.broadinstitute.org/~anshul/projects/encode/rawdata/signal/mar2012/pooledReps/bigwig/macs2signal/foldChange/
Signal tracks are generated for each dataset using MACSv2's signal processing module. Signal tracks represent ChIP signal compared to input control signal.
- FoldEnrichment: The first type of track represents the fold enrichment of ChIP over input. This type of track is useful for detecting and analyzing regions with moderate to low enrichment.
- Does NOT correct for local mappability
- Does NOT differentiate between "missing data" at unmappable locations and true 0 signal.
- The signal is not smoothed. Reads are extended to predominant fragment length.
- It is recommended to smooth the signal before using it unless you are averaging over multiple sites (.e.g aggregation plots)
- Separate tracks are generated for individual replicates and pooled data.
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
37
Human (hg19)
Male genome (with chrY)
Mappability7Unique Mappability track
(Read Lengths 20 to 54)
BINARYhttp://www.broadinstitute.org/~anshul/projects/umap/encodeHg19Male/globalmap_k20tok54.tgz
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

===============================================
How to read the files in a programming language such as matlab/octave
===============================================
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');
fclose(tmp_uMap);

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
38
Human (hg19)
Female genome (no chrY)
Mappability7Unique Mappability track
(Read Lengths 20 to 54)
BINARYhttp://www.broadinstitute.org/~anshul/projects/umap/encodeHg19Female/globalmap_k20tok54.tgz
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

===============================================
How to read the files in a programming language such as matlab/octave
===============================================
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');
fclose(tmp_uMap);

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
39
Human (hg19)TIPTF target predictionsTIPhttp://www.broadinstitute.org/~anshul/projects/encode/rawdata/TFtargets/See README.hsa file in the directory; Chao Cheng's TIP algorithm for predicting TF target genes was applied to the input-normalized ChIP-seq tracks; these are the output files of that method.
Chao Cheng (Chao Cheng <chengchao12@gmail.com>)
Chao Cheng (Chao Cheng <chengchao12@gmail.com>)
40
Human (hg19)HOTHOT RegionsBEDhttp://stanford.edu/~claraya/metrn/data/hot/regions/hs/See README: http://stanford.edu/~claraya/metrn/data/hot/
Carlos Araya (claraya@gmail.com) ; Alan Boyle (aboyle@stanford.edu)
Mike Snyder
41
42
7Mouse (mm9)
Genome Sequence
0Genome SequenceFASTAhttp://hgdownload-test.cse.ucsc.edu/goldenPath/mm9/chromosomes/Per chromosome FASTA file (Random contigs are not used for mapping or computing unique mappability)
43
23Mouse (mm9)ChIP-seq1Raw data (Alignments)BAMhttp://hgdownload-test.cse.ucsc.edu/goldenPath/mm9/encodeDCCDatasets were mapped by individual labs
Philip Cayting (pcayting@stanford.edu); Alan Boyle ; Yong Cheng
Mike Snyder
44
24Mouse (mm9)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
https://docs.google.com/spreadsheet/ccc?key=0Ao3-Or4FCMJEdFpPY2lwWnlZTV92MUNLOHYxbEl4Vnc#gid=0
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statistics
Philip Cayting (pcayting@stanford.edu); Alan Boyle ; Yong Cheng
Mike Snyder
45
19Mouse (mm9)Blacklist4BlacklistsBEDhttp://www.broadinstitute.org/~anshul/projects/mouse/blacklist/mm9-blacklist.bed.gzBrief summary of how the blacklist was generated can be found at
http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeMapability .
More detailed analysis is at http://goo.gl/9FyQF
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
46
25Mouse (mm9)ChIP-seq3IDR Peak CallsNARROWPEAKhttp://www.broadinstitute.org/~anshul/projects/mouse/peaks_spp/idrOptimal/The SPP peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility.
IDR threshold of 0.02 was used.
See https://sites.google.com/site/anshulkundaje/projects/idr for details
Philip Cayting (pcayting@stanford.edu); Alan Boyle ; Yong Cheng
Mike Snyder
47
26Mouse (mm9)ChIP-seq4Blacklist filtered IDR peak Calls (Use these)NARROWPEAKhttp://www.broadinstitute.org/~anshul/projects/mouse/peaks_spp/idrOptimalBlacklistFiltered/IDR Peak calls are filtered against blacklists.Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
48
28Mouse (mm9)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAKhttp://www.broadinstitute.org/~anshul/projects/mouse/peaks_spp/combrep/These are a large set of unthresholded peak calls (upto 300K peaks) from SPP. Useful for analyses that want to analyze low signal peaks.
Philip Cayting (pcayting@stanford.edu); Alan Boyle ; Yong Cheng
Mike Snyder
49
Mouse (mm9)Mappability7Unique Mappability track
(Read Lengths 20 to 54)
BINARYhttp://www.broadinstitute.org/~anshul/projects/umap/mm9/globalmap_k20tok54.tgz
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

===============================================
How to read the files in a programming language such as matlab/octave
===============================================
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');
fclose(tmp_uMap);

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje (anshul@kundaje.net)Manolis Kellis
50
Mouse (mm9)ChIP-seqSignal tracks (Input normalized)BIGWIGNot sure where these are or if they were generated