1 of 9

The challenges of converting data to NWB

Huge diversity of proprietary data formats for acquisition and processing. Conversion requires detailed understanding of each. (~40)
Data is very large and growing
NWB combines data from multiple sources, and requires additional metadata
Tutorials exist in Python and MATLAB, but going from there to a full-fledged conversion script is time-intensive and error prone

2 of 9

The good news: we can build scalable solutions

Many of the operations are similar for different groups and data formats

Large Data handling
Adding essential metadata
User interface

We can leverage packages that provide unified APIs for a variety of proprietary formats

3 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

spikeGLXInterface

kilosort data

kilosortInterface

VerminInterface

(DataInterface)

Proprietary files

4 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

spikeGLXInterface

kilosort data

kilosortInterface

DataInterfaces can handle each data stream

Can mix-and-match
Handles large data efficiently
Extracts any available metadata
Tested with example data and CI

VerminInterface

(DataInterface)

Proprietary files

5 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

VerminInterface

(DataInterface)

spikeGLXInterface

kilosort data

kilosortInterface

Metadata text files hold additional info

Provides context needed for re-analysis
Necessary to meet NWB and DANDI compliance
There can be more than one metadata file

Proprietary files

6 of 9

Strategy: Modularize by data stream

Vermin data

NWBFile:

experimenter: John Doe

identifier: ADDME

institution: Baylor

lab: Tolias Lab

session_description: ADDME

session_start_time: 2018-08-09 10:00:00

Subject:

sex: F

weight: 23g

date_of_birth: 2018-12-28 00:00:00

metadata.yaml

spikeGLX data

TowersTaskConverter

(NWBConverter)

data.nwb

VerminInterface

(DataInterface)

spikeGLXInterface

kilosort data

kilosortInterface

NWBConverter orchestrates entire conversion

One per experiment setup
Specifies what data from which interface
Interfaces with API and web app

Proprietary files

7 of 9

Web app

Easily populate metadata

Future:

Parallelize on remote workers
Interface with DANDI

8 of 9

Mixture of centralized and distributed

There is a central repo, nwb-conversion-tools, and every lab has their own repo
DataInterfaces can either be centrally stored in nwb-conversion-tools (e.g. spikeGLXInterface) or specialized and in the lab-specific repo (e.g. VirmenDataInterface)
All NWBConverters are in the lab-specific repo
Lab-specific repos all pin all dependencies for reproducibility, including pinning NWB Conversion Tools

9 of 9

Next step: Make NWB Conversion Tools work from a single YAML file specifying:

Different experiment types within a dataset
Metadata: Global, experiment-specific, and session-specific
Locations of files

local
S3, Google Drive, Globus, etc.

DataInterfaces to handle those files
Conversion options (compression, chunking, time sync, channel sub-selection, etc.)