1Worm (WS220)
Genome Sequence
0Genome SequenceFASTA chromosome FASTA file
2Worm (WS220)ChIP-seq1Raw AlignmentsSAM mapped data using BWA (only unique mapping reads)
Carlos Araya ( ; Philip Cayting (
Mike Snyder
Worm (WS220)ChIP-seqFiltered AlignmentsTAGALIGN/BED
Unique mapping reads, duplicates filtered
Carlos Araya ( ; Anshul Kundaje (
3Worm (WS220)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statistics.Anshul Kundaje ( Kellis
19Worm (WS220)Blacklist4BlacklistsBED blacklist of regions with artifactual unstructured signalAlan Boyle ( Snyder
4Worm (WS220)ChIP-seq3IDR Peak CallsNARROWPEAK SPP peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility.
IDR threshold of 0.05 was used.
chrM peaks were removed as these were unreliable in most cases.
See for details
Anshul Kundaje ( Kellis
4Worm (WS220)ChIP-seq3Blacklist filtered IDR peak Calls (Use these)NARROWPEAK Peak calls are filtered against blacklistsAlan Boyle ( Snyder
5Worm (WS220)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAK are a large set of unthresholded peak calls (upto ~100K peaks) from SPP. Useful for analyses that want to analyze low signal peaks.
Carlos Araya ( ; Philip Cayting (
Mike Snyder
6Worm (WS220)Mappability7Unique Mappability track
(Read Lengths 20 to 54)
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

How to read the files in a programming language such as matlab/octave
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje ( Kellis
22Worm (WS220)ChIP-seq6Signal tracks (Input normalized)BIGWIG/BEDGRAPH
Signal tracks are generated for each dataset using MACSv2's signal processing module. Signal tracks represent ChIP signal compared to input control signal.
- FoldEnrichment: The first type of track represents the fold enrichment of ChIP over input. This type of track is useful for detecting and analyzing regions with moderate to low enrichment.
- Does NOT correct for local mappability
- Does NOT differentiate between "missing data" at unmappable locations and true 0 signal.
- The signal is not smoothed. Reads are extended to predominant fragment length.
- It is recommended to smooth the signal before using it unless you are averaging over multiple sites (.e.g aggregation plots)
- Separate tracks are generated for individual replicates and pooled data.
Anshul Kundaje ( Kellis
Worm (WS220)TIPTF target predictionTIP README file in the directory; Chao Cheng's TIP algorithm for predicting TF target genes was applied to the input-normalized ChIP-seq tracks; these are the output files of that method. Note that TIP was run on all CHIP-seq datasets, including those with score -1. For most applications you should ignore those results, and treat the score=0 results cautiously.
Chao Cheng (Chao Cheng <>)
Worm (WS220)HOTHOT RegionsBED README at
Carlos Araya ( ; Alan Boyle (
Mike Snyder
7Fly (FB5.45)
Genome Sequence
0Genome SequenceFASTA chromosome FASTA file
8Fly (FB5.45)ChIP-seq1Raw data (Alignments)SAM mapped data using BWA (only unique mapping reads)Anshul Kundaje ( Kellis
Fly (FB5.45)ChIP-seqFiltered AlignmentsTAGALIGN/BED mapping reads, duplicates filtered
9Fly (FB5.45)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statisticsAnshul Kundaje ( Kellis
19Fly (FB5.45)Blacklist4BlacklistsBED blacklist of regions with artifactual unstructured signalAlan Boyle ( Snyder
10Fly (FB5.45)ChIP-seq3IDR Peak CallsNARROWPEAK MACSv2 peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility.
IDR threshold of 0.05 was used.
Anshul Kundaje ( Kellis
10Fly (FB5.45)ChIP-seq3Blacklist filtered IDR peak CallsNARROWPEAK Peak calls are filtered against blacklistsAlan Boyle ( Snyder
11Fly (FB5.45)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAK are a large set of unthresholded peak calls using MACSv2. Useful for analyses that want to analyze low signal peaks.Anshul Kundaje ( Kellis
12Fly (FB5.45)Mappability7Unique Mappability track
(Read Lengths 20 to 54)
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

How to read the files in a programming language such as matlab/octave
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje ( Kellis
22Fly (FB5.45)ChIP-seq6Signal tracks (Input normalized)BIGWIG/BEDGRAPH
Signal tracks are generated for each dataset using MACSv2's signal processing module. Signal tracks represent ChIP signal compared to input control signal.
- FoldEnrichment: The first type of track represents the fold enrichment of ChIP over input. This type of track is useful for detecting and analyzing regions with moderate to low enrichment.
- Does NOT correct for local mappability
- Does NOT differentiate between "missing data" at unmappable locations and true 0 signal.
- The signal is not smoothed. Reads are extended to predominant fragment length.
- It is recommended to smooth the signal before using it unless you are averaging over multiple sites (.e.g aggregation plots)
- Separate tracks are generated for individual replicates and pooled data.
Anshul Kundaje ( Kellis
Fly (FB5.45)TIPTF target predictionTIP file in the directory; Chao Cheng's TIP algorithm for predicting TF target genes was applied to the input-normalized ChIP-seq tracks; these are the output files of that method. Note that TIP was run on all CHIP-seq datasets, including those with score -1. For most applications you should ignore those results, and treat the score=0 results cautiously.
Chao Cheng (Chao Cheng <>)
Carlos Araya ( ; Alan Boyle (
Mike Snyder
7Human (hg19)
Genome Sequence
0Genome SequenceFASTA chromosome FASTA file (Random contigs are not used for mapping or computing unique mappability)
15Human (hg19)ChIP-seq1Raw data (Alignments)BAM
FASTQ and BAM files can be downloaded from the URL. Different labs used different mappers and mapping strategies. Hence, these files should be filtered to standardize them.
Anshul Kundaje ( Kellis
16Human (hg19)ChIP-seq1Raw data (Unique mapping distinct alignments)TAGALIGN files above are filtered to only keep unique mapping reads (tagAlign/ directory).
Then duplicate reads were removed (only one read per position). These can be obtained in the distinctTagAlign/ directory
Anshul Kundaje ( Kellis
17Human (hg19)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statisticsAnshul Kundaje ( Kellis
19Human (hg19)Blacklist4BlacklistsBED
Brief summary of how the blacklist was generated can be found at .
More detailed analysis is at
Anshul Kundaje ( Kellis
18Human (hg19)ChIP-seq3IDR Peak CallsNARROWPEAK SPP peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility. IDR threshold of 0.02 was used.
See for details
Anshul Kundaje ( Kellis
20Human (hg19)ChIP-seq4Blacklist filtered IDR peak Calls (Use these)NARROWPEAK
IDR Peak calls are filtered against blacklists. THESE ARE THE HUMAN PEAK CALLS EVERYONE SHOULD USE.Anshul Kundaje ( Kellis
21Human (hg19)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAK
These are a large set of unthresholded peak calls (upto 300K peaks) from SPP. Useful for analyses that want to analyze low signal peaks.Anshul Kundaje ( Kellis
22Human (hg19)ChIP-seq6Signal tracks (Input normalized)BIGWIG/BEDGRAPH
Signal tracks are generated for each dataset using MACSv2's signal processing module. Signal tracks represent ChIP signal compared to input control signal.
- FoldEnrichment: The first type of track represents the fold enrichment of ChIP over input. This type of track is useful for detecting and analyzing regions with moderate to low enrichment.
- Does NOT correct for local mappability
- Does NOT differentiate between "missing data" at unmappable locations and true 0 signal.
- The signal is not smoothed. Reads are extended to predominant fragment length.
- It is recommended to smooth the signal before using it unless you are averaging over multiple sites (.e.g aggregation plots)
- Separate tracks are generated for individual replicates and pooled data.
Anshul Kundaje ( Kellis
Human (hg19)
Male genome (with chrY)
Mappability7Unique Mappability track
(Read Lengths 20 to 54)
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

How to read the files in a programming language such as matlab/octave
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje ( Kellis
Human (hg19)
Female genome (no chrY)
Mappability7Unique Mappability track
(Read Lengths 20 to 54)
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

How to read the files in a programming language such as matlab/octave
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje ( Kellis
Human (hg19)TIPTF target predictionsTIP README.hsa file in the directory; Chao Cheng's TIP algorithm for predicting TF target genes was applied to the input-normalized ChIP-seq tracks; these are the output files of that method.
Chao Cheng (Chao Cheng <>)
Chao Cheng (Chao Cheng <>)
Human (hg19)HOTHOT RegionsBED README:
Carlos Araya ( ; Alan Boyle (
Mike Snyder
7Mouse (mm9)
Genome Sequence
0Genome SequenceFASTA chromosome FASTA file (Random contigs are not used for mapping or computing unique mappability)
23Mouse (mm9)ChIP-seq1Raw data (Alignments)BAM were mapped by individual labs
Philip Cayting (; Alan Boyle ; Yong Cheng
Mike Snyder
24Mouse (mm9)ChIP-seq2MetaData and Data QualityEXCEL/TAB tables
Measures of enrichment, signal-to-noise ratios, library complexity and peak calling statistics
Philip Cayting (; Alan Boyle ; Yong Cheng
Mike Snyder
19Mouse (mm9)Blacklist4BlacklistsBED summary of how the blacklist was generated can be found at .
More detailed analysis is at
Anshul Kundaje ( Kellis
25Mouse (mm9)ChIP-seq3IDR Peak CallsNARROWPEAK SPP peak caller was used along with the IDR framework for calling peaks and thresholding based on reproducibility.
IDR threshold of 0.02 was used.
See for details
Philip Cayting (; Alan Boyle ; Yong Cheng
Mike Snyder
26Mouse (mm9)ChIP-seq4Blacklist filtered IDR peak Calls (Use these)NARROWPEAK Peak calls are filtered against blacklists.Anshul Kundaje ( Kellis
28Mouse (mm9)ChIP-seq5Relaxed peak calls (unthresholded)NARROWPEAK are a large set of unthresholded peak calls (upto 300K peaks) from SPP. Useful for analyses that want to analyze low signal peaks.
Philip Cayting (; Alan Boyle ; Yong Cheng
Mike Snyder
Mouse (mm9)Mappability7Unique Mappability track
(Read Lengths 20 to 54)
A position 'i' on a particular genomic strand 's' is considered uniquely mappable for a read-length 'k' if the k-mer starting at 'i' on strand 's' maps uniquely i.e. only to position 'i' on strand 's' (no mismatches allowed). There are other ways to define mappability e.g. allowing for mismatches but this is basically an "optimistic" idealized mappability mask not accounting for mismatches.

A whole genome index (except for the human female mask for which chrY was excluded from the index) is created and the Bowtie mapper was used to try to map each k-mer against both strands of the genome.

Each <organism>.<sex>.globalmap_k20tok54.tgz file contains binary files representing uniqueness maps for each chromosome for all read-lengths ranging from 20 to 54 (encoded in a single file for each chromosome)
(a) The files are in uint8 (unsigned 8 bit integers) binary formats (saves disk space)
(b) Each file is basically a vector of unsigned 8bit integers that is the length of the chromosome. The elements of the vector are >= 0 (taking values 0 or 20 to 54)
(c) A value of 'x' at a position means that position is PERFECTLY unique in the genome for all k-mers of length >= x starting at that position on the + strand
(d) A value of 0 at a position means that position is not unique for any of the k-mer lengths (k=20 to 54)
(e) In order to obtain the uniqueness map for a particular read-length 'k', simply perform the following operation on each element of the vector (vector > 0) & (vector <= k)
(f) In order to obtain the uniquness map for the - strand, you simply need to right-shift the vector by <k-1>.
i.e. if position 1 is UNIQUE on the + strand for read-length <k=3> then it implies position 3 is UNIQUE on the - strand

How to read the files in a programming language such as matlab/octave
%First gunzip and untar the globalmap_k20tok54.tgz file
%You will see one file for each chromosome e.g. chr1.uint8.unique
% Read the files as a contiguous binary vector of unsigned 8 bit integers

tmp_uMap = fopen('chr1.uint8.unique','r');
uMapdata = fread(tmp_uMap,'*uint8');

% You can similarly read the files in any other programming language as a vector of unsigned 8bit integers. Convert to doubles if you like (although this is a waste of memory) or write it out as a text file if you prefer
Anshul Kundaje ( Kellis
Mouse (mm9)ChIP-seqSignal tracks (Input normalized)BIGWIGNot sure where these are or if they were generated