Sharded file formats
Norman Rzepka
02 September 2021 ~ #ome-ngff
1
Hi, I’m Norman
About scalable minds
Collaborators
webKnossos
Open-source platform for sharing and annotating tera/peta-scale 3D datasets
Voxelytics
Automated reconstruction toolbox for stitching, alignment, and segmentation (neurons, synapses, organelles)
2
Easily browse through tera- and petabytes of data
EM data
Full Female Adult Fly Brain by �Zheng, Lauritzen et al., Cell, 2018
3
Fly through a dataset
EM data
Mouse Cortex by �Motta et al., Science, 2018
4
Key enabler:
Chunked 3D data streaming and storage
Chunks
32 x 32 x 32 vx
= 32kB for uint8
5
Many 32kB files don’t play well with modern file systems
1TB = 33.554.432 files
6
Idea: Put many chunks in a larger file (shard)
Header
Shard file
32 x 32 x 32 chunks
= 1GB for uint8
7
Details
8
Parallel writing
Uncompressed data:
Compressed data:
Shard
Chunks
9