JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 11

ModelDB

Open-source model management

Manasi Vartak�manasi@verta.ai | @DataCereal

2 of 11

Why Model Management?

Model development is iterative
Currently no way to reliably track models
Can’t support

Reproducibility
Sharing
History & Searching
Learning from previous experiments

Solution → Model Management System

Central repository to track models throughout their lifecycle

@datacereal

Project	# models
Kaggle Toxic Comments Project	400+
Kaggle Speech Recognition	250+
Twitter Timelines Project	100+

3 of 11

What is ModelDB?

First open-source system for model management
Started as research project at MIT Database Group
GitHub, Apache V2 License
Widely adopted: multiple Fortune 500 companies, tech companies, startups
Available as package in KubeFlow

@datacereal

4 of 11

How does it work?

@datacereal

ModelDB Backend

ModelDB WebApp

Jupyter notebook

Modeldb

Python library

ArtifactStore

5 of 11

Status of ModelDB

ModelDB V1 integrated in KubeFlow Dec’18
ModelDB V2 soon to be available

Re-written from the ground-up for extensibility
New API supports any Python-based ML framework
ArtifactStore for file storage
Ability to plug in multiple storage backends
UI re-done

Ongoing integration with KubeFlow, Katib
Started Verta.AI to continue work on ModelDB and beyond!

@datacereal

6 of 11

DEMO

@datacereal

7 of 11

Coming up next

ModelDB V2 OSS release in March

Integration with KF, Katib

ModelDB Managed Service
Deep integrations with ML frameworks (TF, PyTorch, sklearn)
Research projects @MIT on model reuse and hyperopt

@datacereal

8 of 11

Open Questions

How to drive community standards?

Multiple efforts: ModelDB, Google/TF Metadata, MLMD, MSFT, MLFlow, other?
How best to engage / implement?

How to educate practitioners?

Share use cases where model management or lack had a significant impact
Co-author a blog post with us!

@datacereal

9 of 11

Contributions very welcome!

[L] Support multiple backend storage systems (e.g., MySQL, Postgres, Cassandra)
[L] Performance optimizations for high throughput R/W
[M] Deep Integrations with other KubeFlow modules
[S] Share feedback + write how-tos
[S] Co-author blog post

Come to the ModelDB contributor hackathon on March 23rd!

More info on modeldb.slack.com (new!)

@datacereal

10 of 11

Hiring ML, systems, & data vis engineers @

ModelDB

11 of 11

Thank you

Manasi Vartak�manasi@verta.ai | @DataCereal

Verta.AI | https://medium.com/vertaai