List of tools for MLOps_v1_April 2020

	A	B	C	D	E	F	G	H	I	J	K	L
1	Name	Cat	SubCat	Series	$$$	Started	Website	OSS	Description	Note	IF ACQ

2	DataRobot	All-in-one		E	430.6	2012	https://www.datarobot.com/		DataRobot combines a trusted enterprise AI platform and a trusted AI-native strategic partnership for global enterprises that want to harness the power of AI and their existing teams to succeed in today's Intelligence Revolution.	"We lived and breathed data science,"		Forbes 50 AI companies 2019
3	Luigi	All-in-one	Workflow orchestration	Spotify		2012		OSS	Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
4	H2O	All-in-one	Framework	D	146.1	2012	https://www.h2o.ai/	OSS	H2O.ai is the creator of H2O the leading open source machine learning and artificial intelligence platform trusted by data scientists across 14K enterprises
5	HIVE	All-in-one	Labeling	B	20.2	2013	https://thehive.ai/		Hive is a full-stack deep learning company focused on solving visual intelligence problems. Let us help you join the AI Revolution. End-To-End Solutions. Full-Stack Approach.
6	Databricks	All-in-one	Data management	F	897	2013	https://databricks.com/		Unified Data Analytics Platform - One cloud platform for massive scale data engineering and collaborative data science.
7	Iguazio	All-in-one		C	72	2014	https://www.iguazio.com/		The Iguazio Data Science Platform automates your machine learning pipeline, transforming AI projects into real-world business outcomes.
8	Airflow	All-in-one	Workflow orchestration	Airbnb		2015	https://airflow.apache.org/	OSS	Airflow is a platform created by community to programmatically author, schedule and monitor workflows.
9	Polyaxon	All-in-one	Serving			2016	https://polyaxon.com/	OSS	A platform for reproducing and managing the whole life cycle of machine learning and deep learning applications.
10	Dessa	All-in-one	Monitoring	Square	9	2016	https://www.dessa.com/		Create more with machine learning. Build, run & monitor 1000s of ML experiments with Foundations		ACQ
11	Petuum	All-in-one	Data management	B	108	2016	https://petuum.com/		Petuum accelerates and simplifies AI solutions so your enterprise can deploy it easily and maintain it effortlessly.
12	Supervisely	All-in-one	Computer vision			2017	https://supervise.ly/		First available ecosystem to cover all aspects of training data development. Manage, annotate, validate and experiment with your data without coding.
13	Cadence	All-in-one	Workflow orchestration	Uber		2017		OSS	Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
14	Michelangelo	All-in-one	Workflow orchestration	Uber		2015			Michelangelo, Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep learning models that span a myriad of use cases ranging from generating marketplace forecasts, responding to customer support tickets, to calculating accurate estimated times of arrival (ETAs) and powering our One-Click Chat feature using natural language processing (NLP) models on the driver app.
15	MLFlow	All-in-one	Experiment tracking	Databricks		2018	https://mlflow.org/	OSS	An open source platform for the machine learning lifecycle
16	Aible	All-in-one				2018	https://www.aible.com/		Create AI that delivers impact, not accuracy, with cost-benefit tradeoffs & operational constraints, in a friendly, intuitive UI designed for real business.
17	dotData	All-in-one	Feature engineering		43	2018	https://dotdata.com/		When AutoML is enhanced with AI-powered feature engineering, the result is dotData. We focus on delivering data science automation for the enterprise. End-to-end data science automation platform accelerates, democratizes, and operationalizes the entire data science process.
18	Prefect	All-in-one	Workflow orchestration			2018	https://www.prefect.io/	OSS	The Global Leader in Dataflow Automation
19	Metaflow	All-in-one	Workflow orchestration	Netflix		2019	https://metaflow.org/	OSS	Metaflow makes it quick and easy to build and manage real-life data science projects. Metaflow is built for data scientists, not just for machines.		metaflow.org
20	Flyte	All-in-one	Workflow orchestration	Lyft		2019	https://flyte.org/	OSS	Lyft’s Cloud Native Machine Learning and Data Processing Platform, Now Open Sourced
21	Noodle.ai	All-in-one	AI-as-a-service	B	51	2016			We're on a mission to create a world without waste. We push the limits of data science, helping plan, make, and move goods and resources for manufacturers and complex supply chains.	addresses each failure point in the data pipeline from edge device to on-prem and cloud		Forbes 50 AI companies 2019
22	kedro	All-in-one	Workflow orchestration	McKinsey		2019		OSS	Kedro is an open source development workflow tool that helps structure reproducible, scaleable, deployable, robust and versioned data pipelines.
23	Valohai	All-in-one	Workflow orchestration	A		2016	https://valohai.com/		The MLOps platform for the whole team. Valohai takes you from POC to production while managing the whole model lifecycle.	Focus on deep learning. Tooling, technology, framework, and cloud-agnostic.
24	Tecton	All-in-one	Deployment	A	25	2019	https://tecton.ai/		The Data Platform for Machine Learning. Build a library of great features. Serve them in production. Do it at scale.	From the creators of Michaelangelo
25	Datagrok	All-in-one	Data processing				https://datagrok.ai/		Datagrok: Swiss Army Knife for Data. A platform for turning data into actionable insights	can interactively visualize datasets with millions of rows completely in the browse
26	Figure Eight	Data pipeline	Labeling	Appen		2008	https://www.figure-eight.com/		Figure Eight combines the best of human and machine intelligence to provide high-quality annotated training data that powers the world's most innovative machine learning and business solutions		ACQ
27	Spark	Data pipeline	Data processing			2009	https://spark.apache.org/	OSS	Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
28	Scrapinghub	Data pipeline	Data generation			2010	https://scrapinghub.com/	OSS	Turn websites into data with the world's leading web scraping services & tools from the creators of Scrapy. Data extraction trusted by industry leaders.	Web crawling	ELI5
29	Alteryx	Data pipeline	Data management	IPO	163	2011	https://www.alteryx.com/		We are a leader in the self-service data analytics movement with a platform that can discover, prep, and analyze all your data, then deploy and share analytics at scale for deeper insights faster than you ever thought possible.	Control Meets Freedom: Unlock the Data Vault and Unleash Your Data Gurus in a Secure Way.
30	Tamr	Data pipeline	Data management		69.2	2012	https://www.tamr.com/		Tamr's leading data management system and services work to create a data migration strategy that simplifies your data unification process. Talk with us today.			Forbes 50 AI companies 2019
31	Aircloak	Data pipeline	Privacy		1.3	2012	https://aircloak.com/		Aircloak's unique approach ensures the existing primary database is not modified in any way. Aircloak handles all data types including unstructured text.	GDPR compliant
32	Prometheus	Data pipeline	Monitoring			2012	https://prometheus.io/	OSS	An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach.
33	iMerit	Data pipeline	Labeling	B	23.5	2012	https://imerit.net/		iMerit specializes in data labeling and annotation for purposes of training models for Machine Learning and Artificial Intelligence.
34	Presto	Data pipeline	Database/Query			2012	https://prestodb.io/	OSS	Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.
35	Amazon Redshift	Data pipeline	Data warehouse	Amazon		2012	https://aws.amazon.com/redshift/		Amazon Redshift is a fast, fully managed, and cost-effective data warehouse that gives you petabyte scale data warehousing and exabyte scale data lake analytics together in one service. Amazon Redshift is up to ten times faster than traditional on-premises data warehouses.
36	Apache Druid	Data pipeline	Database	Imply		2012	https://druid.apache.org/	OSS	Apache Druid is a high performance real-time analytics database	column-oriented database
37	Waterline Data	Data pipeline	Data management	Hitachi Vantara	37.5	2013	https://www.waterlinedata.com/		Waterline's enterprise data catalog enables data professionals to discover, govern, and rationalize an organization's data lake.		ACQ
38	Incorta	Data pipeline	Data processing	C	72.6	2013	https://incorta.com/		Incorta aggregates large complex business data in real time, eliminating the need to reshape it. No Data Warehouse. No Transformations. Real-Time Insight.
39	Igneous	Data pipeline	Data management	C	67.5	2013	https://www.igneous.io/		Igneous Unstructured Data Protection offers the scalability to handle hundreds of file systems, billions of files, and exabytes of enterprise data requiring backup	Unstructured data
40	Rubrik	Data pipeline	Data management	E	553	2013	https://www.rubrik.com/en		We provide a powerful, policy-driven platform to simplify recovery and unlock insights from data residing in the data center and cloud.
41	Quobyte	Data pipeline	Storage			2013	https://www.quobyte.com/		Quobyte is software defined storage that turns commodity servers into a reliable and highly automated data center file system.
42	Elastifile	Data pipeline	Storage	Google		2013	https://www.elastifile.com/		Elastifile's cloud-native file storage helps organizations adapt and accelerate their business in the cloud era. Powered by a scalable, enterprise-grade distributed file system with intelligent object tiering, Elastifile augments existing public cloud services with a scalable, POSIX-compliant NAS, facilitating frictionless cloud adoption. With Elastifile, organizations enjoy low-touch file storage services, or deploy and manage cloud-native file storage themselves, eliminating the need for manual storage management and IT forecasting. Elastifile's unique combination of features and flexibility empowers organizations to seamlessly integrate cloud resources, with no application refactoring… thereby modernizing their infrastructure and achieving IT agility and efficiency goals.
43	Datera	Data pipeline	Storage	C	63.9	2013	https://datera.io/		Get sub-200µS latency & millions of IOPS with 100% software-defined data automation. Save up to 70% on data infrastructure total-cost-of-ownership.
44	Cohesity	Data pipeline	Data management	D	410	2013	https://www.cohesity.com/		Eliminate mass data fragmentation with Cohesity's modern approach to data management, beginning with backup. Gain instant recovery. Learn more today.
45	AtScale	Data pipeline	Data management	C	95	2013	https://www.atscale.com/		Freedom of choice for the enterprise. Break free the complexities and security risks associated with cloud migration and self-service analytics with Intelligent Data Virtualization—no matter where dat.
46	Apache ORC	Data pipeline	File format			2013	https://orc.apache.org/	OSS	the smallest, fastest columnar storage for Hadoop workloads.
47	Parquet	Data pipeline	File format	Twitter, Cloudera		2013		OSS	Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
48	Cazena	Data pipeline	Data management		38	2014	https://www.cazena.com/		First Data Lake with a SaaS Experience. Cazena empowers enterprises to collect, store and analyze any data in the cloud, without any DevOps resources or admin time. Cazena's Data Lake as a Service includes everything, and is delivered as secure SaaS, ready to load, store and analyze data with any method: SQL, Spark, R, Python, and many more.
49	Confluent	Data pipeline	Realtime data stream	D	205.9	2014	https://www.confluent.io/		Confluent is a fully managed Kafka service and enterprise stream processing platform. Real-time data streaming for AWS, GCP, Azure or serverless. Try free!	founded by the original creators of Apache Kafka
50	Yellowbrick Data	Data pipeline	Data warehouse	C	173	2014	https://yellowbrick.com/		The ultimate solution for your data warehouse. Quick to deploy, easy to expand, and simple to manage. Yellowbrick Data can solve your data problems.
51	Naveego	Data pipeline	Data processing	Seed	0.5	2014	https://www.naveego.com/		A leading provider of cloud-first, distributed data accuracy solutions for seamless, end-to-end data cleansing, Naveego enables organizations to proactively manage, detect and eliminate data accuracy issues across all enterprise data sources in real-time–regardless of structure or schema.
52	Gluent	Data pipeline	Visualization	Seed	5.7	2014	https://gluent.com/		Data virtualization software eliminates data silos. Gluent's transparent data virtualization provides virtual access to all enterprise data, with zero code changes.
53	Vexata	Data pipeline	Storage	StorCentric	54	2014	https://www.vexata.com/		Vexata is an active data infrastructure company that accelerates database and analytic platforms via groundbreaking storage solutions.		ACQ
54	Storbyte	Data pipeline	Storage			2014	http://storbyte.com/		Storbyte designs and manufactures all-flash & hybrid flash enterprise storage arrays that offer performance, power management, availability, reliability, density, efficiency, flexibility, expandability, and affordability. Storbyte is providing innovative data storage solutions and has not lost sight of what is important to end users: a responsible, cost-correct price point.	NOT AI
55	Komprise	Data pipeline	Storage	C	42	2014	https://www.komprise.com/		In 15 minutes, our free data management software trial will show you how you can save 70% on data management costs, on-premises and in the cloud.
56	Excelero	Data pipeline	Storage	B	35	2014	https://www.excelero.com/		Local NVMe performance at data center scale through true convergence. Software-defined block storage for Cloud and Enterprise applications at any scale.
57	ClearSky Data	Data pipeline	Storage	B	59	2014	https://www.clearskydata.com/		ClearSky Data offers enterprise storage as a hybrid cloud service delivering on-demand primary storage, offsite backup, and DR as a single service.
58	Pachyderm	Data pipeline	Versioning	A	12.1	2014	https://www.pachyderm.com/	OSS	Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. And… It's open source!
59	Kimono Labs	Data pipeline	Data generation	Palantir	5	2014	http://www.kimonolabs.com/		Kimono Labs is an online platform that allows its users to convert their websites into APIs.	Web scraping
60	Git LFS	Data pipeline	Versioning	Atlassian, GitHub		2014	https://git-lfs.github.com/	OSS	Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise.	An open source Git extension for versioning large files - 7.9k stars
61	Alluxio	Data pipeline	Data management		16	2015	https://www.alluxio.io/	OSS	an open source data orchestration layer that brings data close to compute for big data and AI/ML workloads in the cloud.
62	Dremio	Data pipeline	Data management		45	2015	https://www.dremio.com/		Get more value from your data, faster. Dremio makes your data engineers more productive, and your data consumers more self-sufficient.	Data lake	founders of the Apache Arrow and Apache Drill
63	Hammerspace	Data pipeline	Database/Query			2015	https://hammerspace.com/		Hammerspace allows data to move freely, like the air you breathe, across clouds and services. Make data accessible exactly where you need it, when you need it – on demand.	Data-as-a-Service
64	Octopai	Data pipeline	Data management	B	6.2	2015	https://www.octopai.com/		An automated, centralized, cross-platform metadata search engine that enables BI groups to quickly and precisely discover and govern shared metadata.
65	Kyvos Insights	Data pipeline	Database/Query			2015	https://www.kyvosinsights.com/		Kyvos accelerates BI on trillions of rows of data on the cloud and on-premise platforms with a semantic layer powered by its next-generation OLAP technology.	It pre-calculates aggregates at multiple levels of dimensional hierarchies to improve query response times as compared to SQL-on-Hadoop platforms
66	Gemini Data	Data pipeline	Data management			2015	https://www.geminidata.com/		Gemini Data provides Data Availability for AI/ML driven analysis and applications to enable unified enterprise knowledge and access.
67	DefinedCrowd	Data pipeline	Data generation	A	13.1	2015	https://www.definedcrowd.com/		Leverage machine learning technology and human intelligence to source, structure, and enrich high quality training data in speech, NLP, and computer vision.			Forbes 50 AI companies 2019
68	Ascend.io	Data pipeline	Data management	A	19	2015	https://www.ascend.io/		Experience continuously optimized data pipelines with less code and fewer breakages. Enter the new era of data engineering with Ascend's autonomous dataflow service.
69	Dask	Data pipeline	Data processing			2015	https://dask.org/	OSS	Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love
70	Quilt	Data pipeline	Versioning	Seed	4.2	2015	https://quiltdata.com/	OSS	Quilt is a versioned data portal for AWS
71	Imply	Data pipeline	Data management	B	45.3	2015	https://imply.io/		Imply delivers real-time analytics powered by Apache Druid. ... Stream or batch load data into Druid for high performance, ad-hoc analytic queries.
72	Vaex	Data pipeline	Data processing			2015	https://vaex.io/	OSS	Power up your business with our data driven solutions. With our unique, state-of-the-art technology, we provide fast and scalable solutions that will make you more agile, while limiting unnecessary resources.	fast pandas	Link
73	erwin	Data pipeline	Data management	Parallax Capital Partners		2016	https://erwin.com/		Integrated enterprise architecture, business process and data modeling with data cataloging and data literacy for risk management and digital transformation.	Data governance	ACQ
74	Aparavi	Data pipeline	Data management			2016	https://www.aparavi.com/		Aparavi's highly scalable data intelligence and automation solutions enable organizations to easily discover, classify, protect, and optimize their data.	backup solution
75	Scale AI	Data pipeline	Data generation	C	122.6	2016	https://scale.com		Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more.
76	LabelImg	Data pipeline	Labeling	Amazon		2016		OSS	LabelImg is a graphical image annotation tool and label object bounding boxes in images	Independent tool
77	Segments.ai	Data pipeline	Labeling			2020	https://segments.ai/		Deep learning-fueled labeling technology with a focus on instance and semantic segmentation.
78	Playment	Data pipeline	Labeling	Seed	2.5	2015	https://playment.io/		Build high-quality ground truth datasets with ML-assisted tools, sophisticated project management software, expert human workforce, and much more.
79	Snorkel	Data pipeline	Labeling			2016		OSS	Programmatically Building and Managing Training Data
80	Qri	Data pipeline	Versioning			2016	https://qri.io/	OSS	Bigger than a spreadsheet, smaller than a database, datasets are all around us. Use Qri to browse, download, create, fork, & publish datasets across a network of peers.
81	Apache Hudi	Data pipeline	Data warehouse	Uber		2016	https://hudi.apache.org/	OSS	Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores)	Data lake
82	Starburst Data	Data pipeline	Database/Query		22	2017	https://www.starburstdata.com/		Limitless Queries. Break boundaries and harness the power of the world's fastest SQL query engine.
83	Fluree	Data pipeline	Database	Seed	4.7	2017	https://flur.ee/		Welcome to better data management. The Fluree platform organizes blockchain-secured data in a highly-scalable, highly-insightful graph database.
84	DVC	Data pipeline	Versioning			2017	https://dvc.org/	OSS	Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments.
85	Pilosa	Data pipeline	Database/Query	Seed	3.7	2017	https://www.pilosa.com/	OSS	Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets.	Molecula
86	Prodigy	Data pipeline	Labeling	Explosion		2017	https://prodi.gy/		Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. ... With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection.
87	Datatable	Data pipeline	Data processing	h2o		2017		OSS	Python library for efficient multi-threaded data processing, with the support for out-of-memory datasets.
88	HYCU	Data pipeline	Data management	-	-	2018	https://www.hycu.com/		Keep hyper-converged infrastructure running with HYCU's powerful, simple backup & recovery and monitoring solutions. Deploy in seconds for superior results.
89	Dolt	Data pipeline	Versioning	Seed	2	2018	https://www.liquidata.co/		Liqiudata's mission is to make data move more efficiently. We built Dolt, an an open-source version-controlled SQL database with Git-like semantics.	SQL database: We have a SQL database with Git versioning semantics called Dolt. As far as we know it's the only database with branch and merge functionality.
90	Dataturks	Data pipeline	Labeling	Walmart		2018	https://dataturks.com/		ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours.		ACQ
91	Voxel51 // Scoop	Data pipeline	Labeling	Seed	3.3	2018	https://voxel51.com/scoop/		Quickly Build Insights into Your Video Datasets. Scoop enables you to make sense of your video datasets quickly and effectively. Scoop's faceted search is one-of-a-kind in the industry to let you quickly distill large amounts of video into the answers you need.
92	Label Studio	Data pipeline	Labeling	Seed	0.15	2018		OSS	Label Studio is a multi-type data labeling and annotation tool with standardized output format
93	Doccano	Data pipeline	Labeling			2018		OSS	Text annotation for Human. Just create project, upload data and start annotation. You can build dataset in hours.
94	Labelbox	Data pipeline	Labeling	A	13.9	2018	https://labelbox.com/		A complete solution for your training data problem with fast labeling tools, human workforce, data management, a powerful API and automation features.
95	cuDF	Data pipeline	Data processing	NVIDIA		2018	https://rapids.ai/		Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.
96	Modin	Data pipeline	Data processing			2018	https://github.com/modin-project/modin	OSS	Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical.
97	FEAST	Data pipeline	Feature engineering			2019	https://feast.dev/	OSS	Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data.	Feature store
98	Tumult Labs	Data pipeline	Privacy			2019	https://www.tmlt.io/		Unleashing the power of data with ironclad privacy protection	Differential privacy
99	AresDB	Data pipeline	Database/Query	Uber		2019	https://github.com/uber/aresdb	OSS	A GPU-powered real-time analytics storage and query engine.
100	SQLFlow	Data pipeline	Database/Query			2019	https://sql-machine-learning.github.io/		Extends SQL to support AI. Extract knowledge from Data. Currently support MySQL, Apache Hive, Alibaba MaxCompute, XGBoost and TensorFlow.