1 of 9

The challenges of converting data to NWB

  • Huge diversity of proprietary data formats for acquisition and processing. Conversion requires detailed understanding of each. (~40)
  • Data is very large and growing
  • NWB combines data from multiple sources, and requires additional metadata
  • Tutorials exist in Python and MATLAB, but going from there to a full-fledged conversion script is time-intensive and error prone

2 of 9

The good news: we can build scalable solutions

Many of the operations are similar for different groups and data formats

  • Large Data handling
  • Adding essential metadata
  • User interface

We can leverage packages that provide unified APIs for a variety of proprietary formats

3 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

spikeGLXInterface

kilosort data

kilosortInterface

VerminInterface

(DataInterface)

Proprietary files

4 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

spikeGLXInterface

kilosort data

kilosortInterface

DataInterfaces can handle each data stream

  • Can mix-and-match
  • Handles large data efficiently
  • Extracts any available metadata
  • Tested with example data and CI

VerminInterface

(DataInterface)

Proprietary files

5 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

VerminInterface

(DataInterface)

spikeGLXInterface

kilosort data

kilosortInterface

Metadata text files hold additional info

  • Provides context needed for re-analysis
  • Necessary to meet NWB and DANDI compliance
  • There can be more than one metadata file

Proprietary files

6 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

VerminInterface

(DataInterface)

spikeGLXInterface

kilosort data

kilosortInterface

NWBConverter orchestrates entire conversion

  • One per experiment setup
  • Specifies what data from which interface
  • Interfaces with API and web app

Proprietary files

7 of 9

Web app

  • Easily populate metadata

Future:

  • Parallelize on remote workers
  • Interface with DANDI

8 of 9

Mixture of centralized and distributed

  • There is a central repo, nwb-conversion-tools, and every lab has their own repo
  • DataInterfaces can either be centrally stored in nwb-conversion-tools (e.g. spikeGLXInterface) or specialized and in the lab-specific repo (e.g. VirmenDataInterface)
  • All NWBConverters are in the lab-specific repo
  • Lab-specific repos all pin all dependencies for reproducibility, including pinning NWB Conversion Tools

9 of 9

Next step: Make NWB Conversion Tools work from a single YAML file specifying:

  • Different experiment types within a dataset
  • Metadata: Global, experiment-specific, and session-specific
  • Locations of files
    • local
    • S3, Google Drive, Globus, etc.
  • DataInterfaces to handle those files
  • Conversion options (compression, chunking, time sync, channel sub-selection, etc.)