1 of 9

Sharded file formats

Norman Rzepka

02 September 2021 ~ #ome-ngff

1

2 of 9

Hi, I’m Norman

Norman Rzepka

Co-Founder of scalable minds

Twitter: @normanrz

About scalable minds

Collaborators

webKnossos

Open-source platform for sharing and annotating tera/peta-scale 3D datasets

Voxelytics

Automated reconstruction toolbox for stitching, alignment, and segmentation (neurons, synapses, organelles)

2

3 of 9

Easily browse through tera- and petabytes of data

EM data

Full Female Adult Fly Brain by �Zheng, Lauritzen et al., Cell, 2018

3

4 of 9

Fly through a dataset

EM data

Mouse Cortex by �Motta et al., Science, 2018

4

5 of 9

Key enabler:

Chunked 3D data streaming and storage

Chunks

32 x 32 x 32 vx

= 32kB for uint8

5

6 of 9

Many 32kB files don’t play well with modern file systems

1TB = 33.554.432 files

6

7 of 9

Idea: Put many chunks in a larger file (shard)

Header

Shard file

32 x 32 x 32 chunks

= 1GB for uint8

7

8 of 9

Details

  • Every chunk can be read in one sequence�(spinning disk-friendly)
  • Each chunk can be compressed individually (with index)
  • Chunks are stored in Z-Order to allow for efficient multi-chunk cutouts

8

9 of 9

Parallel writing

Uncompressed data:

  • Parallel writes to individual chunks

Compressed data:

  • Parallel writes to individual shards

Shard

Chunks

9