ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
Directory Structure Schema:
https://docs.google.com/spreadsheets/d/1fDxpEvln9o7P0YrTZPBxpTY98UkDuWEhmaMUFA8LNHE/edit#gid=1843407110
2
3
Options: required, optional, EPIC, n/a
4
Stereo-SeqVisium (no probes)Visium (with probes)GeoMx (NGS)GeoMx (nCounter)HiFihrsTP-seq (DBiTSeq)XeniumResolveCosMxMERFISHPathFile nameFile formatData formatData Schemais from an instrument?ManufacturerModelDerived byProcess version numberInstrument or program sourceis QA?ExampleFile or directory description
5
requiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredTOP/Top level directory
6
required RNAseqrequired RNAseq (with probes)required RNAseq (with probes)n/an/an/an/an/an/an/aSequencing Files To Be Included
7
requiredn/an/an/an/an/an/an/an/an/aHistology Files To Be Included
8
requiredsee histologyrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredTOP/extras/Folder for general lab-specific files related to the dataset
9
requiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredTOP/extras/microscope_hardware.jsonhttps://www.protocols.io/view/getting-started-with-micro-meta-app-tutorial-36wgq7ddxvk5/v7/materials4DN-BINA-OMEnomicro-meta appyesA file generated by the micro-meta app that contains a description of the hardware components of the microscope. Email HuBMAP Consortium Help Desk <help@hubmapconsortium.org> if help is required in generating this document.
10
optionaloptionaloptionaloptionaloptionaloptionaloptionaloptionaloptionalTOP/extras/microscope_settings.jsonhttps://www.protocols.io/view/getting-started-with-micro-meta-app-tutorial-36wgq7ddxvk5/v7/materials4DN-BINA-OMEnomicro-meta appyesA file generated by the micro-meta app that contains a description of the settings that were used to acquire the image data. Email HuBMAP Consortium Help Desk <help@hubmapconsortium.org> if help is required in generating this document.
11
requiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredTOP/raw/All raw data files for the experiment.
12
requiredrequiredn/an/an/an/an/an/an/an/aTOP/raw/<slide ID>.gprGPRno10X GenomicsGPR downloadernoThis is a 10X Genomics layout file that's generated by 10X and individualized for each Visium slide. This is a text file and can be generated using this 10X web form along with the unique 10X Visium slide ID.
13
n/an/an/an/an/an/an/an/arequiredn/aTOP/raw/*_expMat_file.csv
14
n/an/an/an/an/an/an/an/arequiredn/aTOP/raw/*_fov_positions_file.csv
15
n/an/an/an/an/an/an/an/arequiredn/aTOP/raw/*_metadata_file.csv
16
n/an/an/an/an/an/an/an/arequiredn/aTOP/raw/*_tx_file.csv
17
n/an/arequiredrequiredn/an/an/an/an/an/aTOP/raw/*_LabWorksheet.txttxtyesNanoStringAn Excel spreadsheet to refer to in setting up the library. This file documents all of the samples from a single collection plate. Generated by DSP run, prior to sequencing.
18
n/an/arequiredn/an/an/an/an/arequiredn/aTOP/raw/*_config.initxtFor CosMx this would include: AtoMx server version
module version
acquisition instrument firmware version
yesNanoStringNeeded to generate the DCC file from the fastq file. Contains pipeline processing parameters. Generated by DSP run, prior to sequencing.
19
n/an/arequiredn/an/an/an/an/an/an/aTOP/raw/*_SeqCodeIndices.csvcsvA file with sample information needed by the Illumina software. Use the contents of the SeqCodeIndices.csv file to create a SampleSheet.csv for input to the Illumina sequencer. (NextSeq 1000/2000 users download a SampleSheet.csv and whitelist.txt instead of SeqCodeIndices.csv.) Generated by DSP run.
20
n/an/aoptionaloptionaln/aoptionaloptionalrequiredn/aTOP/raw/markers.csvcsvname of fluorophor, vendor, channelA csv file describing any morphology markers used to guide ROI and/or AOI selection.
21
n/an/arequiredrequiredn/an/an/an/an/aTOP/raw/*targets.pkcpkcThe file listing probe barcode sequence and corresponding gene symbol or proteins targeted by that probe. This should be consistent for the same probe panel.
22
n/asee RNAseq (with probes)see RNAseq (with probes)optionaln/aoptionaloptionaloptionaloptionalTOP/raw/additional_panels_used.csvcsvmanufacturer, model/name, product codeIf multiple commercial probe panels were used, then the primary probe panel should be selected in the "oligo_probe_panel" metadata field. The additional panels must be included in this file. Each panel record should include: manufacturer, model/name, product code.
23
n/an/an/an/arequiredrequiredrequiredTOP/raw/gene_panel.csvgene_id (ensembl ID), gene_nameThe list of target genes.
24
n/an/an/an/aoptionaloptionaloptionaloptionalTOP/raw/custom_probe_set.csvcsvgene_name, probe_seq, probe_idThis file should contain any custom probes used and must be included if the metadata field "is_custom_probes_used" is "Yes". The file should minimally include: target gene id, probe seq, probe id. The contents of this file are modeled after the 10x Genomics probe set file (see https://support.10xgenomics.com/spatial-gene-expression-ffpe/probe-sets/probe-set-file-descriptions/probe-set-file-descriptions#probe_set_csv_file).
25
n/an/an/an/aoptionaln/an/an/aTOP/raw/custom_probe_set.bedThis is a BED file version of the custom probe set file.
26
n/an/an/an/an/arequiredrequiredrequiredn/aTOP/raw/transcript_locations.csvcsvgene name, x, y, z (optional), quality score (optional)The origin of the coordinate is 0,0 at the top left corner of the image. The file should include: gene name, x, y, z (optional), quality score (optional). It is expected that the first row in the file contains the column header.
27
n/an/an/an/an/aoptionalrequiredoptionaln/aTOP/raw/custom_gene_list.csvcsvgene name, ensemble ID
28
n/an/an/an/an/aoptionaloptionaloptionaln/aTOP/raw/probes.csvcsvGene Ensembl_ID Coverage Codewords AnnotationA CSV file describing the probe panel used. This is tyipcally what's used to specific the probe set when ordering a probe panel for a Xenium run.
29
n/an/an/an/an/arequiredn/an/an/aTOP/raw/gene_panel.jsonjsonThis is the JSON file describing the probes, as output from the xenium-ranger pipeline.
30
n/an/an/arequiredn/an/an/an/an/an/aTOP/raw/*.rccThe reporter code count from the nCounter. This is the readout, akin to the fastq files from the NGS workflow. There will be one RCC file per AOI.
31
n/an/an/arequiredn/an/an/an/an/aTOP/raw/*.cdfCartridge definition file for the nCounter.
32
optionalTOP/raw/micron_to_mosaic_pixel_transform.csvMatrix used to transform from pixels to physical distance.
33
requiredTOP/raw/manifest.jsonThis file contains stain by channel details and pixel details.
34
requiredTOP/raw/codebook.csvCSV containing codebook information for the experiment. Rows are barcodes and columns are imaging rounds. The first column is the barcode target, and the following column IDs are expected to be sequential, and round identifiers are expected to be integers (not roman numerals).
35
n/an/an/an/an/an/an/an/an/arequiredTOP/raw/positions.csv file that includes the top left coordinate of each tiled image. This is required to stitch the images.
36
n/an/an/an/an/an/an/an/an/arequiredTOP/raw/dataorganization.csvnecessary image definitions
37
n/an/an/an/an/an/an/an/an/arequiredTOP/raw/*.DAXThe raw image stack.
38
see RNAseqsee RNAseq (with probes)see RNAseq (with probes)n/arequiredn/an/an/an/aTOP/raw/fastq/Raw sequencing files for the experiment
39
n/an/an/an/an/an/aTOP/raw/fastq/oligo/Directory containing fastq files pertaining to oligo sequencing.
40
n/an/an/an/an/an/aTOP/raw/fastq/oligo/*.fastq.gzfastqhttps://en.wikipedia.org/wiki/FASTQ_formathttps://en.wikipedia.org/wiki/FASTQ_formatyesIlluminanoThis is a gzip version of the fastq file. This file contains the cell barcode and unique molecular identifier (technical).
41
n/an/an/arequiredn/an/an/an/aTOP/raw/fastq/RNA/Directory containing fastq files pertaining to RNAseq sequencing.
42
n/an/an/arequiredn/an/an/an/aTOP/raw/fastq/RNA/*_R*.fastq.gzfastqhttps://en.wikipedia.org/wiki/FASTQ_formathttps://en.wikipedia.org/wiki/FASTQ_formatnonoThis is a GZip'd version of the forward and reverse fastq files from RNAseq sequencing (R1 and R2).
43
n/an/an/an/an/an/an/arequiredn/aTOP/raw/images/FOV*/
44
n/an/an/an/an/an/an/arequiredn/aTOP/raw/images/FOV*/*_complete_code_cell_target_call_coord.csvTarget coordinates and counts per cell.
45
n/an/an/an/an/an/an/arequiredn/aTOP/raw/images/FOV*/*.tiff
46
see histologyrequiredoptionaloptionaln/an/an/an/arequiredTOP/raw/images/Directory containing raw image files. This directory should include at least one raw file.
47
optionaln/an/an/an/an/an/an/aTOP/raw/images/*_tissue.tif or *_tissue.tifftiffhttps://en.wikipedia.org/wiki/TIFFyesnoRaw microscope file for the experiment. For 10X Visium CytAssist, this would be the high resolution image produced.
48
requiredn/an/an/an/an/an/an/an/aTOP/raw/images/*_fiducial.tif or *_fiducial.tifftiffhttps://en.wikipedia.org/wiki/TIFFThis is the low resolution image from the 10X CytAssist instrument that includes the fiduciary markings.
49
optionaln/an/an/an/an/an/an/aTOP/raw/images/*.ndpiNDPIBio-FormatsBio-FormatsyesHamamatsunoRaw microscope file for the experiment
50
n/an/an/an/an/an/an/an/an/arequiredTOP/raw/images/*.tifRaw microscope file for the experiment
51
n/an/aoptionaloptionaln/aoptionaloptionaloptionaln/aTOP/raw/images/overlay.jpeg*.jpegyesNanoStringState whether an overlay image was used to guide ROI selection. If an overlay is used, then the overlay details will be provided in the protocols.io protocol. If used, this needs to be uploaded. It is not included in the OME TIFF. This can be a JPEG or TIFF file
52
n/an/aoptionaloptionaln/aoptionaloptionaloptionaln/aTOP/raw/images/overlay.tiff*.tiffyesNanoStringState whether an overlay image was used to guide ROI selection. If an overlay is used, then the overlay details will be provided in the protocols.io protocol. If used, this needs to be uploaded. It is not included in the OME TIFF. This can be a JPEG or TIFF file
53
n/an/an/an/an/arequiredn/an/an/an/aTOP/raw/images/HuBMAP_ID.jpgjpegMicroscope image of tissue area processed by the assay.
54
n/an/an/an/an/arequiredn/an/an/an/aTOP/raw/images/position.txttextThe list of pixels that are on the tissue. [Is this meant to represent a segmentation mask?]
55
n/an/an/an/an/an/an/an/arequiredn/aTOP/raw/images/preview_scan.pngpng
56
n/an/an/an/arequiredn/an/an/an/aTOP/raw/Spatial_barcodes/Files containing spatial barcodes and coordinates.
57
n/an/an/an/arequiredn/an/an/an/aTOP/raw/Spatial_barcodes/spatialBarcode_sequence.fastafastanoDeduplicated spatial barcodes.
58
n/an/an/an/arequiredn/an/an/an/aTOP/raw/Spatial_barcodes/spatialBarcode_dup.txttab-delimited txtnoDuplicated spatial barcodes (different ID but same sequence). Each row contains a pair of barcodes with the same sequence.
59
n/an/an/an/arequiredn/an/an/an/aTOP/raw/Spatial_barcodes/spatialBarcode_locations.txttab-delimited txtnoSpatial coordinates of each spot.
60
requiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredrequiredTOP/lab_processed/Experiment files that were processed by the lab generating the data.
61
n/an/arequiredrequiredn/an/an/an/an/an/aTOP/lab_processed/"Initial Dataset.xlsx"ExcelyesNanoStringyesAn excel spreadsheet that is downloaded from the GeoMx DSP Data Analysis Suite containing QA/QC metrics based on raw, unprocessed target counts. This file contains one row per AOI/segment and no analyses span AOI. The AOIs included in this file can come from different GeoMx runs and hence span Globus uploads. So care must be taken to make sure the appropriate AOIs are included in the file.
62
n/an/aoptionalrequiredn/an/an/an/an/an/aTOP/lab_processed/annotations.xlsxExcelAOI specific annotations. This might include cell type and anatomical information.
63
n/an/an/an/an/an/an/an/an/arequiredTOP/lab_processed/detected_transcripts.csv, barcode_id, global_x, global_y, global_z, x, y, fov, gene, transcript_id, cell_idA file containing the locations of each RNA target.
64
n/an/arequiredn/an/an/an/an/an/an/aTOP/lab_processed/dcc/DCC files generated from fastq by the Nanostring GeoMx NGS Pipeline.
65
n/an/arequiredn/an/an/an/an/an/an/aTOP/lab_processed/dcc/*.dccDCCnoNanoStringGeoMx NGS PipelineDCC files containing target probe counts, generated from fastq by the Nanostring GeoMx NGS Pipeline.
66
see histologyrequiredrequiredrequiredn/arequiredrequiredrequiredrequiredrequiredTOP/lab_processed/images/Processed image files
67
requiredrequiredrequiredn/arequiredrequiredrequiredrequiredrequiredTOP/lab_processed/images/*.ome.tiffOME TIFFhttps://docs.openmicroscopy.org/ome-model/latest/ome-tiff/nonoHBM892.MDXS.293.ome.tiffOME-TIFF files (multichannel, multi-layered) produced by the microscopy experiment. If compressed, must use loss-less compression algorithm. For Visium this stitched file should only include the single capture area relevant to the current dataset. For GeoMx there will be one OME TIFF file per slide, with each slide including multiple AOIs. See the following link for the set of fields that are required in the OME TIFF file XML header. https://docs.google.com/spreadsheets/d/1YnmdTAA0Z9MKN3OjR3Sca8pz-LNQll91wdQoRPSP6Q4/edit#gid=0
68
requiredrequiredrequiredn/arequiredrequiredrequiredrequiredrequiredTOP/lab_processed/images/*ome-tiff.channels.csvCSVhttps://en.wikipedia.org/wiki/Comma-separated_valuesOME TIFF ChannelsnonoThis file provides essential documentation pertaining to each channel of the accommpanying OME TIFF. The file should contain one row per OME TIFF channel. The required fields are detailed https://docs.google.com/spreadsheets/d/1xEJSb0xn5C5fB3k62pj1CyHNybpt4-YtvUs5SUMS44o/edit#gid=0
69
70
71
72
n/an/aEPICn/an/an/an/an/aTOP/lab_processed/primary_analysis/Primary analysis results
73
n/an/aEPICn/an/an/an/an/aTOP/lab_processed/primary_analysis/"Q3 Normalized.xlsx"ExcelyesNanoStringDSP Data Analysis SuiteyesResults from initial procesing by GeoMx DSP Data Analysis Suite. The collection of datasets were normalized using Q3 normalization after target genes below the limit of quantitation (LOQ) are removed.
74
n/an/an/an/aEPICn/an/an/aTOP/lab_processed/alignment/Experiment files that were processed by the lab generating the data.
75
n/an/an/an/aEPICn/an/an/aTOP/lab_processed/alignment/*.bambamhttps://en.wikipedia.org/wiki/Binary_Alignment_Maphttps://en.wikipedia.org/wiki/Binary_Alignment_MapnoAligned sequencing data (R2 reads) from HiFi-slide experiments against reference HG38
76
n/an/an/an/aEPICn/an/an/aTOP/lab_processed/alignment/SpotByGene.txt.gznospot x gene expression matrix
77
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/Output from the 10X Genomics SpaceRanger pipeline
78
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/_cmdlinetxtno10X GenomicsSpaceRangerSpaceRangernoThe actual CLI invocation of spaceranger command.
79
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/_invocationtxtno10X GenomicsSpaceRangerSpaceRangernoMartian runtime specification determined from CLI invocation
80
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/_sitechecktxtno10X GenomicsSpaceRangerSpaceRangernoThis contains details about the local environment within which SpaceRanger was run.
81
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/_versionstxtno10X GenomicsSpaceRangerSpaceRangernoThis contains the version number of both SpaceRanger and Martian that were used
82
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/raw_feature_bc_matrix.h5h5Output Matricesno10X GenomicsSpaceRangerSpaceRangernoHDF5 file containing unfiltered cell x gene matrix
83
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/filtered_feature_bc_matrix.h5h5Output Matricesno10X GenomicsSpaceRangerSpaceRangernoHDF5 file containing called cell x gene matrix
84
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/metrics_summary.csvcsvGene Expression Metricsno10X GenomicsSpaceRangerSpaceRangeryesBrief reporting metrics
85
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/molecule_info.h5h5Molecule Infono10X GenomicsSpaceRangerSpaceRangernoHDF5 file containing (primarily) cell-barcode/umi-barcode/read triplet counts
86
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/possorted_genome_bam.bamBAMOutput BAMno10X GenomicsSpaceRangerSpaceRangernoAligned (and position-sorted) BAM file containing sequencing data mapped against the human genome.
87
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/possorted_genome_bam.bam.baiBAM indexOutput BAMno10X GenomicsSpaceRangerSpaceRangernoIndex for the aligned (and position-sorted) BAM file containing sequencing data mapped against the human genome.
88
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/cloupe.cloupeLoupeLoupe Cell BrowserLoupe Cell Browserno10X GenomicsSpaceRangerSpaceRangernoLoupe Browser file for data visualization and analysis
89
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/web_summary.htmlHTMLhttps://en.wikipedia.org/wiki/HTMLOutput Summaryno10X GenomicsSpaceRangerSpaceRangeryesData summary with images, metrics, and plots that can be used for quality assessment as well as errors and warnings related to data quality
90
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/spatial/Output from the 10X Genomics SpaceRanger pipeline
91
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/spatial/aligned_fiducials.jpgjpegOutput Spatialno10X GenomicsSpaceRangerSpaceRangerJPG image containing aligned fiducials for the capture area
92
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/spatial/detected_tissue_image.jpgjpegOutput Spatialno10X GenomicsSpaceRangerSpaceRangerJPG image showing capture spots that are covered by tissue
93
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/spatial/scalefactors_json.jsonjsonOutput Spatialno10X GenomicsSpaceRangerSpaceRangerSpaceranger downscales images to 2000px on the smallest side. This file contains the spot size (diameter in pixels) at full and downscaled resolutions.
94
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/spatial/tissue_hires_image.pngpngOutput Spatialno10X GenomicsSpaceRangerSpaceRangerPNG image of aligned tissue at full/original resolution
95
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/spatial/tissue_lowres_image.pngpngOutput Spatialno10X GenomicsSpaceRangerSpaceRangerPNG image of aligned tissue at downscaled resolution
96
EPICEPICn/an/an/an/an/an/an/aTOP/lab_processed/spaceranger/spatial/tissue_positions_list.csvcsvOutput Spatialno10X GenomicsSpaceRangerSpaceRangerSpatial positions of cell barcodes
97
98
99
100