1 of 11

ModelDB

Open-source model management

Manasi Vartak�manasi@verta.ai | @DataCereal

2 of 11

Why Model Management?

  • Model development is iterative
  • Currently no way to reliably track models
  • Can’t support
    • Reproducibility
    • Sharing
    • History & Searching
    • Learning from previous experiments
  • Solution → Model Management System
    • Central repository to track models throughout their lifecycle

2

@datacereal

Project

# models

Kaggle Toxic Comments Project

400+

Kaggle Speech Recognition

250+

Twitter Timelines Project

100+

3 of 11

What is ModelDB?

  • First open-source system for model management
  • Started as research project at MIT Database Group
  • GitHub, Apache V2 License
  • Widely adopted: multiple Fortune 500 companies, tech companies, startups
  • Available as package in KubeFlow

3

@datacereal

4 of 11

How does it work?

4

@datacereal

ModelDB Backend

ModelDB WebApp

DB

Jupyter notebook

Modeldb

Python library

ArtifactStore

5 of 11

Status of ModelDB

  • ModelDB V1 integrated in KubeFlow Dec’18
  • ModelDB V2 soon to be available
    • Re-written from the ground-up for extensibility
    • New API supports any Python-based ML framework
    • ArtifactStore for file storage
    • Ability to plug in multiple storage backends
    • UI re-done
  • Ongoing integration with KubeFlow, Katib
  • Started Verta.AI to continue work on ModelDB and beyond!

5

@datacereal

6 of 11

6

@datacereal

7 of 11

Coming up next

  • ModelDB V2 OSS release in March
    • Integration with KF, Katib
  • ModelDB Managed Service
  • Deep integrations with ML frameworks (TF, PyTorch, sklearn)
  • Research projects @MIT on model reuse and hyperopt

7

@datacereal

8 of 11

Open Questions

  1. How to drive community standards?
    • Multiple efforts: ModelDB, Google/TF Metadata, MLMD, MSFT, MLFlow, other?
    • How best to engage / implement?
  2. How to educate practitioners?
    • Share use cases where model management or lack had a significant impact
    • Co-author a blog post with us!

8

@datacereal

9 of 11

Contributions very welcome!

  • [L] Support multiple backend storage systems (e.g., MySQL, Postgres, Cassandra)
  • [L] Performance optimizations for high throughput R/W
  • [M] Deep Integrations with other KubeFlow modules
  • [S] Share feedback + write how-tos
  • [S] Co-author blog post

Come to the ModelDB contributor hackathon on March 23rd!

More info on modeldb.slack.com (new!)

9

@datacereal

10 of 11

10

Hiring ML, systems, & data vis engineers @

ModelDB

11 of 11

Thank you