1 of 20

Cloud-native open storage format for quantitative biological dynamics data

Koji Kyoda1, Kenneth H.L. Ho2, and Shuichi Onami1

1RIKEN BDR, 2Francis Crick Institute

2022/04/12

2 of 20

Bioimaging data ecosystem

Swedlow and Onami

Global BioImaging EoE V. 2020

3 of 20

Sharing of bioimaging data

SSBD:database

  • Added-value (Rich metadata)
  • Bioimage data
    • original image files shared via OMERO platform
  • Quantitative data
    • shared in BDML/BD5 format

SSBD:repository

  • Archive (Mimimal metadata)
  • Bioimage data
    • original image files _________
  • Quantitative data
    • original data files

Tohsato et al. (2016) Bioinformatics

4 of 20

Open and unified data format

  • Accessing data and metadata
  • Sharing of data and their reuse
  • Development of tools and their use

Data providers

Tool developers

All tools can be used for data analysis.

All data can be used for tool evaluation.

An open unified format

5 of 20

BDML: Biological Dynamics Markup Language

  • XML-based format for representing spatiotemporal dynamics of biological objects ranging from molecules to cells to organisms.

Kyoda et al. (2015) Bioinformatics

<scaleUnit>

<tScale>20</tScale>

<tUnit>second</tUnit>

</scaleUnit>

<component>

<componentID>100</componentID>

<time>1</time>

<measurement>

<point><xyz><x>10.32</x><y>30.42</y><z>18.32</z></xyz></point>

</measurement>

</component>

<component>

<componentID>101</componentID>

<time>2</time>

<prevID>100</prevID>

<measurement>

<point><xyz><x>9.57</x><y>32.05</y><z>14.91</z></xyz></point>

</measurement>

</component>

point

line

face

circle

sphere

6 of 20

BD5

  • HDF5-based format for representing quantitative biological dynamics data
    • Fast access to the data and fast transfer of files containing large data

Kyoda et al. (2020) PLoS One

7 of 20

OME-NGFF

  • Ome-zarr is a zarr-based format for storing bioimaging data.
    • The storage of chunked, compressed, N-dimensional arrays

(Moore et al., Nat. Methods, 2021)

Image set

metadata

array data

for multiscale

masks

(segmentation data)

8 of 20

Formats for bioimaging data

image data

quantitative data

OME-TIFF

ome-zarr

BDML

BD5

in the cloud

in a data center

?

9 of 20

Overview

  • We aim to develop BDZ, a cloud-native open storage for quantitative biological dynamics data.

ome-zarr

BDZ

S3

Phenotype analysis

New analytical methods

Synchronous visualization

bioimaging data

object storage

10 of 20

Spatial Omics data in OME-NGFF

  • Spatial-Omics-Hackathon-2021 was held (2021/09/29).
    • Use of AnnData
    • Store position information in the central data (“X”) array
    • Store most properties, or “features”, as separate “obs” arrays

https://forum.image.sc/t/ome-ngff-spatial-omics-hackathon/57337

from squidpy tutorial

11 of 20

AnnData

  • A Python package for handling annotated data matrices
    • X: data matrix
    • obs: one-dimensional observations
    • var: one-dimensional variables
    • obsp: pairwise annotation of observations

.csv

.tsv

.loom

.h5ad

.zarr

[https://anndata.readthedocs.io]

12 of 20

Proposal: How to store quantitative data

  • A combination of ome-zarr (pixel-based ROI) and AnnData (dynamics)

images

low-level

detection

representative

position

tracking info.

OME-NGFF (with Labels)

AnnData-style

t, z, y, x

ID,

feature (volume, etc.)

centroid

Labels

t, z, y, x

ID, radius,

feature

X

obs

matrix

matrix

features

Bao et al. (2006)

13 of 20

AnnData-style representation for dynamics data

  • Store coordinates information of biological objects in X array
  • Store features information as separate obs array
  • Store tracking information as separate obsp array

t

z

y

x

1.0

2.4

2.1

3.2

2.0

3.4

2.5

3.3

3.0

3.2

2.6

3.1

ID

entity

signal

1001

point

4.5

1002

point

4.5

1003

point

4.6

1001

1002

1003

1001

0

1

0

1002

0

0

1

1003

0

0

0

to

from

14 of 20

Pixel-based ROI data

  • Pixel-based ROI data stored as labels in ome-zarr
    • ROI data (ome-zarr) and dynamics data (AnnData) can be linked by mapping gray-scale level and ID of biological object or using meta-data for labels.

image

labels

OME-NGFF structure

image data

labels data

t

z

y

x

1.0

2.4

2.1

3.2

2.0

3.4

2.5

3.3

3.0

3.2

2.6

3.1

ID

entity

signal

1001

point

4.5

1002

point

4.5

1003

point

4.6

AnnData

X

obs

.zattr

15 of 20

Example of BDZ (line entity)

  • Nuclear division dynamics data of early C. elegans embryo in Kyoda et al. (2013)

wt-N2-081015-01

|

|--- 0

| |

| |--t

| |--c

| |--z

| |--y

| |--x

|

|--- labels

| |

| |--0

| |

| |--t

| |--...

|

|--- dyn

|

|-- X

|

|-- obs

|

|-- obsm

image data

Pixel-based ROI data

Dynamics data

X

obs

obsp

position data

feature data

tracking data

16 of 20

Example of BDZ (sphere entity)

  • Nuclear division dynamics data of C. elegans embryo in Bao et al. (2006)

0801505_L1

|

|--- dyn

|

|-- X

|

|-- obs

|

|-- obsm

Dynamics data

X

obs

obsp

tracking data

position data

feature data

17 of 20

Data visualization

  • with napari image viewer

18 of 20

Future plan

  • Metadata specification for BDZ
  • Performance evaluation
  • Converters from/to BDML/BD5/BDZ

  • Container for a list of vertex (for polygon mesh data)?

19 of 20

Summary

  • We have developed BDZ, a cloud-native storage of quantitative data of biological dynamics.
    • The data consists of pixel-based ROI data and dynamics data.
    • The data is stored in the style of AnnData.
    • The data can be stored within a layer of ome-zarr.

20 of 20

Acknowledgement

  • RIKEN Open Life Science Platform (Funding)

  • Norio KOBAYASHI (RIKEN R-IH)
  • Hideyuki Jitsumoto (RIKEN R-IH)
  • RIKEN Information Systems Division

  • Bioimaging Community
    • Josh Moore (University of Dundee)
    • Kevin Yamauchi (ETH Zürich)
    • Matthew Hartley (European Bioinformatics Institute)

  • SSBD team (RIKEN Onami Lab.)