DataOps For The Modern Computer Vision Stack
James Le
DataOps For The Modern Computer Vision Stack
James Le
Presenter Profile
James Le
Now
Before
Interests
About Me
NOW
BEFORE
INTERESTS
Agenda
1. What Is DataOps?
2. Why DataOps For Computer Vision?
3. DataOps Key Principles
4. DataOps Pipeline for the Computer Vision Stack
5. Data Challenges for Computer Vision Teams
6. The Future of the Modern Computer Vision Stack
Agenda
1 - What Is DataOps?
2 - Why DataOps for Computer Vision?
3 - DataOps Key Principles
4 - DataOps Pipeline for the Computer Vision Stack
5 - Data Challenges for Computer Vision Teams
6 - The Future of The Modern Computer Vision Stack
What Is
DataOps?
What Is DataOps?
DataOps vs DevOps
Source: DevOps vs DataOps (by Sprinkle Data)
DataOps vs DevOps
Source: DevOps vs DataOps (by Sprinkle Data)
DataOps vs MLOps
Source: DataOps - Adjusting DevOps for Analytics Product Development (by Altexsoft)
DataOps vs MLOps
Source: DataOps - Adjusting DevOps for Analytics Product Development (by Altexsoft)
What Led To The Rise of DataOps?
Source: Modern Analytics Stack (by Datafold)
Source: Apache Spark DataFrames for Large Scale Data Science (by Databricks)
Source: What is DataOps? (by Atlan)
What Led To The Rise of DataOps?
Source: What is DataOps? (by Atlan)
Source: Apache Spark DataFrames for Large Scale Data Science (by Databricks)
Source: Modern Analytics Stack (by Datafold)
The DataOps Landscape
Source: What is DataOps? (by Gradient Flow)
The DataOps Landscape
Source: What is DataOps? (by Gradient Flow)
Why DataOps For
Computer Vision?
Why DataOps For Computer Vision?
Data Is More Important Than Models
Why DataOps For
Computer Vision? (⅓)
This sentiment is conveyed by Francois Chollet - the creator of Keras (Source: Twitter)
Why DataOps For Computer Vision? (⅓)
Data Is More Important Than Models
This sentiment is conveyed by Francois Chollet - the creator of Keras (Source: Twitter)
Unstructured Data Preparation Is Challenging
Why DataOps For
Computer Vision? (⅔)
Rareplane dataset that incorporates both real and synthetically generated satellite imagery (Source: Superb AI)
Why DataOps For Computer Vision? (⅔)
Unstructured Data Preparation Is Challenging
Rareplanes dataset that incorporates both real and synthetically generated satellite imagery (Source: Superb AI)
Building Computer Vision Applications Is Iterative
Why DataOps For
Computer Vision? (3/3)
The Two Loops of Building Algorithmic Products (Source: Taivo Pungas)
Why DataOps For Computer Vision? (3/3)
Building Computer Vision Applications is Iterative
The Two Loops of Building Algorithmic Products (Source: Taivo Pungas)
DataOps
Key Principles
DataOps Key Principles
Principle 1 - Implement Best Practices for Development
Follow Software Engineering Cycle Guidelines
Source: Engineering Best Practices for ML (by Alex Serban)
Source: Rules of ML (by Google)
Principle 1 - Implement Best Practices for Development
Follow Software Engineering Cycle Guidelines
Source: Engineering Best Practices for ML (by Alex Serban)
Source: Rules of ML (by Google)
Principle 2 - Automate and Orchestrate All Data Flows
Source: Continuous Delivery for Machine Learning (by ThoughtWorks)
Continuous Integration and Continuous Delivery
Principle 2 - Automate and Orchestrate All Data Flows
Continuous Integration and Continuous Delivery
Source: Continuous Delivery for Machine Learning (by ThoughtWorks)
Principle 3 - Test Data Quality In All Stages of Data Lifecycle
Source: Why Data Quality Is Key to Successful MLOps (by Superconductive)
Continuous Testing
Principle 3 - Test Data Quality In All Stages of Data Lifecycle
Continuous Testing
Source: Why Data Quality Is Key to Successful MLOps (by Superconductive)
Principle 4 - Monitor Quality and Performance Metrics Across Data Flows
Source: What is Data Observability? (by Monte Carlo)
Source: Beyond Monitoring: The Rise of Observability (by Arize AI)
Improve Observability
Source: Anatomy of an Enterprise AI Observability Platform (by WhyLabs)
Principle 4 - Monitor Quality and Performance Metrics Across Data Flows
Improve Observability
Source: What is Data Observability? (by Monte Carlo)
Source: Beyond Monitoring: The Rise of Observability (by Arize AI)
Source: Anatomy of an Enterprise AI Observability Platform (by WhyLabs)
Principle 5 - Build a Common Data and Metadata Model
Source: Automated Data Versioning (by Pachyderm)
Focus on Data Semantics
Principle 5 - Build a Common Data and Metadata Model
Focus on Data Semantics
Source: Automated Data Versioning (by Pachyderm)
Principle 6 - Empower Collaboration Among Data Stakeholders
Cross-Functional Teams
Principle 6 - Empower Collaboration Among Data Stakeholders
Cross-Functional Teams
DataOps For
Computer Vision Stack?
DataOps For Computer Vision Stack?
Proposed DataOps for the Modern Computer Vision Stack
Key Data Challenges For
Computer Vision Teams
Key Data Challenges For Computer Vision Teams
Challenge 1: Curate High-Quality Data Points
Pain Points
Solutions
Source: The Best Data Curation Tools for Computer Vision (by Siasearch)
Challenge 1: Curate High-Quality Data Points
Source: The Best Data Curation Tools for Computer Vision (by Siasearch)
Challenge 2: Label and Audit Data at Massive Scale
Source: Automate Data Preparation for Computer Vision (by Superb AI)
Pain Points
Solutions
Challenge 2: Label and Audit Data at Massive Scale
Source: Automate Data Preparation for Computer Vision (by Superb AI)
Challenge 3: Account For Data Drift
Source: Why Should You Care About Data and Concept Drift (by Evidently AI)
Pain Points
Solutions
Challenge 3: Account for Data Drift
Source: Why Should You Care About Data and Concept Drift (by Evidently AI)
The Future Of The
Modern Computer Vision Stack
The Future of The Modern Computer Vision Stack
Following The Footsteps of The Modern Data Stack
The Modern Data Stack is a collection of cloud-native tools centered around a cloud data warehouse.
Benefits:
Source: The Modern Data Stack Ecosystem - Spring 2022 Edition (by Continual)
Following The Footstep of “The Modern Data Stack”
Source: The Modern Data Stack Ecosystem - Fall 2021 Edition (by Continual)
The Modern Data Stack is a collection of cloud-native tools centered around a cloud data warehouse.
Benefits:
The Canonical Stack for ML
Source: The Rapid Evolution of the Canonical Stack for Machine Learning (by Daniel Jeffries)
The Canonical Stack for Machine Learning
Source: The Rapid Evolution of the Canonical Stack for Machine Learning (by Daniel Jeffries)
Startup Opportunities in ML Infrastructure
Source: Startup Opportunities in ML Infrastructure (by Leigh-Marie Braswell)
Startup Opportunities in Machine Learning Infrastructure
Source: Startup Opportunities in ML Infrastructure (by Leigh-Marie Braswell)
Thank you!
Thank you!