A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Name | Cat | SubCat | Series | $$$ | Started | Website | OSS | Description | Note | IF ACQ | |||||||||||||||
2 | DataRobot | All-in-one | E | 430.6 | 2012 | https://www.datarobot.com/ | DataRobot combines a trusted enterprise AI platform and a trusted AI-native strategic partnership for global enterprises that want to harness the power of AI and their existing teams to succeed in today's Intelligence Revolution. | "We lived and breathed data science," | Forbes 50 AI companies 2019 | |||||||||||||||||
3 | Luigi | All-in-one | Workflow orchestration | Spotify | 2012 | OSS | Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in. | |||||||||||||||||||
4 | H2O | All-in-one | Framework | D | 146.1 | 2012 | https://www.h2o.ai/ | OSS | H2O.ai is the creator of H2O the leading open source machine learning and artificial intelligence platform trusted by data scientists across 14K enterprises | |||||||||||||||||
5 | HIVE | All-in-one | Labeling | B | 20.2 | 2013 | https://thehive.ai/ | Hive is a full-stack deep learning company focused on solving visual intelligence problems. Let us help you join the AI Revolution. End-To-End Solutions. Full-Stack Approach. | ||||||||||||||||||
6 | Databricks | All-in-one | Data management | F | 897 | 2013 | https://databricks.com/ | Unified Data Analytics Platform - One cloud platform for massive scale data engineering and collaborative data science. | ||||||||||||||||||
7 | Iguazio | All-in-one | C | 72 | 2014 | https://www.iguazio.com/ | The Iguazio Data Science Platform automates your machine learning pipeline, transforming AI projects into real-world business outcomes. | |||||||||||||||||||
8 | Airflow | All-in-one | Workflow orchestration | Airbnb | 2015 | https://airflow.apache.org/ | OSS | Airflow is a platform created by community to programmatically author, schedule and monitor workflows. | ||||||||||||||||||
9 | Polyaxon | All-in-one | Serving | 2016 | https://polyaxon.com/ | OSS | A platform for reproducing and managing the whole life cycle of machine learning and deep learning applications. | |||||||||||||||||||
10 | Dessa | All-in-one | Monitoring | Square | 9 | 2016 | https://www.dessa.com/ | Create more with machine learning. Build, run & monitor 1000s of ML experiments with Foundations | ACQ | |||||||||||||||||
11 | Petuum | All-in-one | Data management | B | 108 | 2016 | https://petuum.com/ | Petuum accelerates and simplifies AI solutions so your enterprise can deploy it easily and maintain it effortlessly. | ||||||||||||||||||
12 | Supervisely | All-in-one | Computer vision | 2017 | https://supervise.ly/ | First available ecosystem to cover all aspects of training data development. Manage, annotate, validate and experiment with your data without coding. | ||||||||||||||||||||
13 | Cadence | All-in-one | Workflow orchestration | Uber | 2017 | OSS | Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way. | |||||||||||||||||||
14 | Michelangelo | All-in-one | Workflow orchestration | Uber | 2015 | Michelangelo, Uber’s machine learning (ML) platform, supports the training and serving of thousands of models in production across the company. Designed to cover the end-to-end ML workflow, the system currently supports classical machine learning, time series forecasting, and deep learning models that span a myriad of use cases ranging from generating marketplace forecasts, responding to customer support tickets, to calculating accurate estimated times of arrival (ETAs) and powering our One-Click Chat feature using natural language processing (NLP) models on the driver app. | ||||||||||||||||||||
15 | MLFlow | All-in-one | Experiment tracking | Databricks | 2018 | https://mlflow.org/ | OSS | An open source platform for the machine learning lifecycle | ||||||||||||||||||
16 | Aible | All-in-one | 2018 | https://www.aible.com/ | Create AI that delivers impact, not accuracy, with cost-benefit tradeoffs & operational constraints, in a friendly, intuitive UI designed for real business. | |||||||||||||||||||||
17 | dotData | All-in-one | Feature engineering | 43 | 2018 | https://dotdata.com/ | When AutoML is enhanced with AI-powered feature engineering, the result is dotData. We focus on delivering data science automation for the enterprise. End-to-end data science automation platform accelerates, democratizes, and operationalizes the entire data science process. | |||||||||||||||||||
18 | Prefect | All-in-one | Workflow orchestration | 2018 | https://www.prefect.io/ | OSS | The Global Leader in Dataflow Automation | |||||||||||||||||||
19 | Metaflow | All-in-one | Workflow orchestration | Netflix | 2019 | https://metaflow.org/ | OSS | Metaflow makes it quick and easy to build and manage real-life data science projects. Metaflow is built for data scientists, not just for machines. | metaflow.org | |||||||||||||||||
20 | Flyte | All-in-one | Workflow orchestration | Lyft | 2019 | https://flyte.org/ | OSS | Lyft’s Cloud Native Machine Learning and Data Processing Platform, Now Open Sourced | ||||||||||||||||||
21 | Noodle.ai | All-in-one | AI-as-a-service | B | 51 | 2016 | We're on a mission to create a world without waste. We push the limits of data science, helping plan, make, and move goods and resources for manufacturers and complex supply chains. | addresses each failure point in the data pipeline from edge device to on-prem and cloud | Forbes 50 AI companies 2019 | |||||||||||||||||
22 | kedro | All-in-one | Workflow orchestration | McKinsey | 2019 | OSS | Kedro is an open source development workflow tool that helps structure reproducible, scaleable, deployable, robust and versioned data pipelines. | |||||||||||||||||||
23 | Valohai | All-in-one | Workflow orchestration | A | 2016 | https://valohai.com/ | The MLOps platform for the whole team. Valohai takes you from POC to production while managing the whole model lifecycle. | Focus on deep learning. Tooling, technology, framework, and cloud-agnostic. | ||||||||||||||||||
24 | Tecton | All-in-one | Deployment | A | 25 | 2019 | https://tecton.ai/ | The Data Platform for Machine Learning. Build a library of great features. Serve them in production. Do it at scale. | From the creators of Michaelangelo | |||||||||||||||||
25 | Datagrok | All-in-one | Data processing | https://datagrok.ai/ | Datagrok: Swiss Army Knife for Data. A platform for turning data into actionable insights | can interactively visualize datasets with millions of rows completely in the browse | ||||||||||||||||||||
26 | Figure Eight | Data pipeline | Labeling | Appen | 2008 | https://www.figure-eight.com/ | Figure Eight combines the best of human and machine intelligence to provide high-quality annotated training data that powers the world's most innovative machine learning and business solutions | ACQ | ||||||||||||||||||
27 | Spark | Data pipeline | Data processing | 2009 | https://spark.apache.org/ | OSS | Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. | |||||||||||||||||||
28 | Scrapinghub | Data pipeline | Data generation | 2010 | https://scrapinghub.com/ | OSS | Turn websites into data with the world's leading web scraping services & tools from the creators of Scrapy. Data extraction trusted by industry leaders. | Web crawling | ELI5 | |||||||||||||||||
29 | Alteryx | Data pipeline | Data management | IPO | 163 | 2011 | https://www.alteryx.com/ | We are a leader in the self-service data analytics movement with a platform that can discover, prep, and analyze all your data, then deploy and share analytics at scale for deeper insights faster than you ever thought possible. | Control Meets Freedom: Unlock the Data Vault and Unleash Your Data Gurus in a Secure Way. | |||||||||||||||||
30 | Tamr | Data pipeline | Data management | 69.2 | 2012 | https://www.tamr.com/ | Tamr's leading data management system and services work to create a data migration strategy that simplifies your data unification process. Talk with us today. | Forbes 50 AI companies 2019 | ||||||||||||||||||
31 | Aircloak | Data pipeline | Privacy | 1.3 | 2012 | https://aircloak.com/ | Aircloak's unique approach ensures the existing primary database is not modified in any way. Aircloak handles all data types including unstructured text. | GDPR compliant | ||||||||||||||||||
32 | Prometheus | Data pipeline | Monitoring | 2012 | https://prometheus.io/ | OSS | An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach. | |||||||||||||||||||
33 | iMerit | Data pipeline | Labeling | B | 23.5 | 2012 | https://imerit.net/ | iMerit specializes in data labeling and annotation for purposes of training models for Machine Learning and Artificial Intelligence. | ||||||||||||||||||
34 | Presto | Data pipeline | Database/Query | 2012 | https://prestodb.io/ | OSS | Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. | |||||||||||||||||||
35 | Amazon Redshift | Data pipeline | Data warehouse | Amazon | 2012 | https://aws.amazon.com/redshift/ | Amazon Redshift is a fast, fully managed, and cost-effective data warehouse that gives you petabyte scale data warehousing and exabyte scale data lake analytics together in one service. Amazon Redshift is up to ten times faster than traditional on-premises data warehouses. | |||||||||||||||||||
36 | Apache Druid | Data pipeline | Database | Imply | 2012 | https://druid.apache.org/ | OSS | Apache Druid is a high performance real-time analytics database | column-oriented database | |||||||||||||||||
37 | Waterline Data | Data pipeline | Data management | Hitachi Vantara | 37.5 | 2013 | https://www.waterlinedata.com/ | Waterline's enterprise data catalog enables data professionals to discover, govern, and rationalize an organization's data lake. | ACQ | |||||||||||||||||
38 | Incorta | Data pipeline | Data processing | C | 72.6 | 2013 | https://incorta.com/ | Incorta aggregates large complex business data in real time, eliminating the need to reshape it. No Data Warehouse. No Transformations. Real-Time Insight. | ||||||||||||||||||
39 | Igneous | Data pipeline | Data management | C | 67.5 | 2013 | https://www.igneous.io/ | Igneous Unstructured Data Protection offers the scalability to handle hundreds of file systems, billions of files, and exabytes of enterprise data requiring backup | Unstructured data | |||||||||||||||||
40 | Rubrik | Data pipeline | Data management | E | 553 | 2013 | https://www.rubrik.com/en | We provide a powerful, policy-driven platform to simplify recovery and unlock insights from data residing in the data center and cloud. | ||||||||||||||||||
41 | Quobyte | Data pipeline | Storage | 2013 | https://www.quobyte.com/ | Quobyte is software defined storage that turns commodity servers into a reliable and highly automated data center file system. | ||||||||||||||||||||
42 | Elastifile | Data pipeline | Storage | 2013 | https://www.elastifile.com/ | Elastifile's cloud-native file storage helps organizations adapt and accelerate their business in the cloud era. Powered by a scalable, enterprise-grade distributed file system with intelligent object tiering, Elastifile augments existing public cloud services with a scalable, POSIX-compliant NAS, facilitating frictionless cloud adoption. With Elastifile, organizations enjoy low-touch file storage services, or deploy and manage cloud-native file storage themselves, eliminating the need for manual storage management and IT forecasting. Elastifile's unique combination of features and flexibility empowers organizations to seamlessly integrate cloud resources, with no application refactoring… thereby modernizing their infrastructure and achieving IT agility and efficiency goals. | ||||||||||||||||||||
43 | Datera | Data pipeline | Storage | C | 63.9 | 2013 | https://datera.io/ | Get sub-200µS latency & millions of IOPS with 100% software-defined data automation. Save up to 70% on data infrastructure total-cost-of-ownership. | ||||||||||||||||||
44 | Cohesity | Data pipeline | Data management | D | 410 | 2013 | https://www.cohesity.com/ | Eliminate mass data fragmentation with Cohesity's modern approach to data management, beginning with backup. Gain instant recovery. Learn more today. | ||||||||||||||||||
45 | AtScale | Data pipeline | Data management | C | 95 | 2013 | https://www.atscale.com/ | Freedom of choice for the enterprise. Break free the complexities and security risks associated with cloud migration and self-service analytics with Intelligent Data Virtualization—no matter where dat. | ||||||||||||||||||
46 | Apache ORC | Data pipeline | File format | 2013 | https://orc.apache.org/ | OSS | the smallest, fastest columnar storage for Hadoop workloads. | |||||||||||||||||||
47 | Parquet | Data pipeline | File format | Twitter, Cloudera | 2013 | OSS | Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language. | |||||||||||||||||||
48 | Cazena | Data pipeline | Data management | 38 | 2014 | https://www.cazena.com/ | First Data Lake with a SaaS Experience. Cazena empowers enterprises to collect, store and analyze any data in the cloud, without any DevOps resources or admin time. Cazena's Data Lake as a Service includes everything, and is delivered as secure SaaS, ready to load, store and analyze data with any method: SQL, Spark, R, Python, and many more. | |||||||||||||||||||
49 | Confluent | Data pipeline | Realtime data stream | D | 205.9 | 2014 | https://www.confluent.io/ | Confluent is a fully managed Kafka service and enterprise stream processing platform. Real-time data streaming for AWS, GCP, Azure or serverless. Try free! | founded by the original creators of Apache Kafka | |||||||||||||||||
50 | Yellowbrick Data | Data pipeline | Data warehouse | C | 173 | 2014 | https://yellowbrick.com/ | The ultimate solution for your data warehouse. Quick to deploy, easy to expand, and simple to manage. Yellowbrick Data can solve your data problems. | ||||||||||||||||||
51 | Naveego | Data pipeline | Data processing | Seed | 0.5 | 2014 | https://www.naveego.com/ | A leading provider of cloud-first, distributed data accuracy solutions for seamless, end-to-end data cleansing, Naveego enables organizations to proactively manage, detect and eliminate data accuracy issues across all enterprise data sources in real-time–regardless of structure or schema. | ||||||||||||||||||
52 | Gluent | Data pipeline | Visualization | Seed | 5.7 | 2014 | https://gluent.com/ | Data virtualization software eliminates data silos. Gluent's transparent data virtualization provides virtual access to all enterprise data, with zero code changes. | ||||||||||||||||||
53 | Vexata | Data pipeline | Storage | StorCentric | 54 | 2014 | https://www.vexata.com/ | Vexata is an active data infrastructure company that accelerates database and analytic platforms via groundbreaking storage solutions. | ACQ | |||||||||||||||||
54 | Storbyte | Data pipeline | Storage | 2014 | http://storbyte.com/ | Storbyte designs and manufactures all-flash & hybrid flash enterprise storage arrays that offer performance, power management, availability, reliability, density, efficiency, flexibility, expandability, and affordability. Storbyte is providing innovative data storage solutions and has not lost sight of what is important to end users: a responsible, cost-correct price point. | NOT AI | |||||||||||||||||||
55 | Komprise | Data pipeline | Storage | C | 42 | 2014 | https://www.komprise.com/ | In 15 minutes, our free data management software trial will show you how you can save 70% on data management costs, on-premises and in the cloud. | ||||||||||||||||||
56 | Excelero | Data pipeline | Storage | B | 35 | 2014 | https://www.excelero.com/ | Local NVMe performance at data center scale through true convergence. Software-defined block storage for Cloud and Enterprise applications at any scale. | ||||||||||||||||||
57 | ClearSky Data | Data pipeline | Storage | B | 59 | 2014 | https://www.clearskydata.com/ | ClearSky Data offers enterprise storage as a hybrid cloud service delivering on-demand primary storage, offsite backup, and DR as a single service. | ||||||||||||||||||
58 | Pachyderm | Data pipeline | Versioning | A | 12.1 | 2014 | https://www.pachyderm.com/ | OSS | Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. And… It's open source! | |||||||||||||||||
59 | Kimono Labs | Data pipeline | Data generation | Palantir | 5 | 2014 | http://www.kimonolabs.com/ | Kimono Labs is an online platform that allows its users to convert their websites into APIs. | Web scraping | |||||||||||||||||
60 | Git LFS | Data pipeline | Versioning | Atlassian, GitHub | 2014 | https://git-lfs.github.com/ | OSS | Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise. | An open source Git extension for versioning large files - 7.9k stars | |||||||||||||||||
61 | Alluxio | Data pipeline | Data management | 16 | 2015 | https://www.alluxio.io/ | OSS | an open source data orchestration layer that brings data close to compute for big data and AI/ML workloads in the cloud. | ||||||||||||||||||
62 | Dremio | Data pipeline | Data management | 45 | 2015 | https://www.dremio.com/ | Get more value from your data, faster. Dremio makes your data engineers more productive, and your data consumers more self-sufficient. | Data lake | founders of the Apache Arrow and Apache Drill | |||||||||||||||||
63 | Hammerspace | Data pipeline | Database/Query | 2015 | https://hammerspace.com/ | Hammerspace allows data to move freely, like the air you breathe, across clouds and services. Make data accessible exactly where you need it, when you need it – on demand. | Data-as-a-Service | |||||||||||||||||||
64 | Octopai | Data pipeline | Data management | B | 6.2 | 2015 | https://www.octopai.com/ | An automated, centralized, cross-platform metadata search engine that enables BI groups to quickly and precisely discover and govern shared metadata. | ||||||||||||||||||
65 | Kyvos Insights | Data pipeline | Database/Query | 2015 | https://www.kyvosinsights.com/ | Kyvos accelerates BI on trillions of rows of data on the cloud and on-premise platforms with a semantic layer powered by its next-generation OLAP technology. | It pre-calculates aggregates at multiple levels of dimensional hierarchies to improve query response times as compared to SQL-on-Hadoop platforms | |||||||||||||||||||
66 | Gemini Data | Data pipeline | Data management | 2015 | https://www.geminidata.com/ | Gemini Data provides Data Availability for AI/ML driven analysis and applications to enable unified enterprise knowledge and access. | ||||||||||||||||||||
67 | DefinedCrowd | Data pipeline | Data generation | A | 13.1 | 2015 | https://www.definedcrowd.com/ | Leverage machine learning technology and human intelligence to source, structure, and enrich high quality training data in speech, NLP, and computer vision. | Forbes 50 AI companies 2019 | |||||||||||||||||
68 | Ascend.io | Data pipeline | Data management | A | 19 | 2015 | https://www.ascend.io/ | Experience continuously optimized data pipelines with less code and fewer breakages. Enter the new era of data engineering with Ascend's autonomous dataflow service. | ||||||||||||||||||
69 | Dask | Data pipeline | Data processing | 2015 | https://dask.org/ | OSS | Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love | |||||||||||||||||||
70 | Quilt | Data pipeline | Versioning | Seed | 4.2 | 2015 | https://quiltdata.com/ | OSS | Quilt is a versioned data portal for AWS | |||||||||||||||||
71 | Imply | Data pipeline | Data management | B | 45.3 | 2015 | https://imply.io/ | Imply delivers real-time analytics powered by Apache Druid. ... Stream or batch load data into Druid for high performance, ad-hoc analytic queries. | ||||||||||||||||||
72 | Vaex | Data pipeline | Data processing | 2015 | https://vaex.io/ | OSS | Power up your business with our data driven solutions. With our unique, state-of-the-art technology, we provide fast and scalable solutions that will make you more agile, while limiting unnecessary resources. | fast pandas | Link | |||||||||||||||||
73 | erwin | Data pipeline | Data management | Parallax Capital Partners | 2016 | https://erwin.com/ | Integrated enterprise architecture, business process and data modeling with data cataloging and data literacy for risk management and digital transformation. | Data governance | ACQ | |||||||||||||||||
74 | Aparavi | Data pipeline | Data management | 2016 | https://www.aparavi.com/ | Aparavi's highly scalable data intelligence and automation solutions enable organizations to easily discover, classify, protect, and optimize their data. | backup solution | |||||||||||||||||||
75 | Scale AI | Data pipeline | Data generation | C | 122.6 | 2016 | https://scale.com | Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more. | ||||||||||||||||||
76 | LabelImg | Data pipeline | Labeling | Amazon | 2016 | OSS | LabelImg is a graphical image annotation tool and label object bounding boxes in images | Independent tool | ||||||||||||||||||
77 | Segments.ai | Data pipeline | Labeling | 2020 | https://segments.ai/ | Deep learning-fueled labeling technology with a focus on instance and semantic segmentation. | ||||||||||||||||||||
78 | Playment | Data pipeline | Labeling | Seed | 2.5 | 2015 | https://playment.io/ | Build high-quality ground truth datasets with ML-assisted tools, sophisticated project management software, expert human workforce, and much more. | ||||||||||||||||||
79 | Snorkel | Data pipeline | Labeling | 2016 | OSS | Programmatically Building and Managing Training Data | ||||||||||||||||||||
80 | Qri | Data pipeline | Versioning | 2016 | https://qri.io/ | OSS | Bigger than a spreadsheet, smaller than a database, datasets are all around us. Use Qri to browse, download, create, fork, & publish datasets across a network of peers. | |||||||||||||||||||
81 | Apache Hudi | Data pipeline | Data warehouse | Uber | 2016 | https://hudi.apache.org/ | OSS | Apache Hudi ingests & manages storage of large analytical datasets over DFS (hdfs or cloud stores) | Data lake | |||||||||||||||||
82 | Starburst Data | Data pipeline | Database/Query | 22 | 2017 | https://www.starburstdata.com/ | Limitless Queries. Break boundaries and harness the power of the world's fastest SQL query engine. | |||||||||||||||||||
83 | Fluree | Data pipeline | Database | Seed | 4.7 | 2017 | https://flur.ee/ | Welcome to better data management. The Fluree platform organizes blockchain-secured data in a highly-scalable, highly-insightful graph database. | ||||||||||||||||||
84 | DVC | Data pipeline | Versioning | 2017 | https://dvc.org/ | OSS | Open-source version control system for Data Science and Machine Learning projects. Git-like experience to organize your data, models, and experiments. | |||||||||||||||||||
85 | Pilosa | Data pipeline | Database/Query | Seed | 3.7 | 2017 | https://www.pilosa.com/ | OSS | Pilosa is an open source, distributed bitmap index that dramatically accelerates continuous analysis across multiple, massive data sets. | Molecula | ||||||||||||||||
86 | Prodigy | Data pipeline | Labeling | Explosion | 2017 | https://prodi.gy/ | Prodigy is a scriptable annotation tool so efficient that data scientists can do the annotation themselves, enabling a new level of rapid iteration. ... With Prodigy you can take full advantage of modern machine learning by adopting a more agile approach to data collection. | |||||||||||||||||||
87 | Datatable | Data pipeline | Data processing | h2o | 2017 | OSS | Python library for efficient multi-threaded data processing, with the support for out-of-memory datasets. | |||||||||||||||||||
88 | HYCU | Data pipeline | Data management | - | - | 2018 | https://www.hycu.com/ | Keep hyper-converged infrastructure running with HYCU's powerful, simple backup & recovery and monitoring solutions. Deploy in seconds for superior results. | ||||||||||||||||||
89 | Dolt | Data pipeline | Versioning | Seed | 2 | 2018 | https://www.liquidata.co/ | Liqiudata's mission is to make data move more efficiently. We built Dolt, an an open-source version-controlled SQL database with Git-like semantics. | SQL database: We have a SQL database with Git versioning semantics called Dolt. As far as we know it's the only database with branch and merge functionality. | |||||||||||||||||
90 | Dataturks | Data pipeline | Labeling | Walmart | 2018 | https://dataturks.com/ | ML data annotations made super easy for teams. Just upload data, add your team and build training/evaluation dataset in hours. | ACQ | ||||||||||||||||||
91 | Voxel51 // Scoop | Data pipeline | Labeling | Seed | 3.3 | 2018 | https://voxel51.com/scoop/ | Quickly Build Insights into Your Video Datasets. Scoop enables you to make sense of your video datasets quickly and effectively. Scoop's faceted search is one-of-a-kind in the industry to let you quickly distill large amounts of video into the answers you need. | ||||||||||||||||||
92 | Label Studio | Data pipeline | Labeling | Seed | 0.15 | 2018 | OSS | Label Studio is a multi-type data labeling and annotation tool with standardized output format | ||||||||||||||||||
93 | Doccano | Data pipeline | Labeling | 2018 | OSS | Text annotation for Human. Just create project, upload data and start annotation. You can build dataset in hours. | ||||||||||||||||||||
94 | Labelbox | Data pipeline | Labeling | A | 13.9 | 2018 | https://labelbox.com/ | A complete solution for your training data problem with fast labeling tools, human workforce, data management, a powerful API and automation features. | ||||||||||||||||||
95 | cuDF | Data pipeline | Data processing | NVIDIA | 2018 | https://rapids.ai/ | Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. | |||||||||||||||||||
96 | Modin | Data pipeline | Data processing | 2018 | https://github.com/modin-project/modin | OSS | Modin uses Ray to provide an effortless way to speed up your pandas notebooks, scripts, and libraries. Unlike other distributed DataFrame libraries, Modin provides seamless integration and compatibility with existing pandas code. Even using the DataFrame constructor is identical. | |||||||||||||||||||
97 | FEAST | Data pipeline | Feature engineering | 2019 | https://feast.dev/ | OSS | Feast (Feature Store) is a tool for managing and serving machine learning features. Feast is the bridge between models and data. | Feature store | ||||||||||||||||||
98 | Tumult Labs | Data pipeline | Privacy | 2019 | https://www.tmlt.io/ | Unleashing the power of data with ironclad privacy protection | Differential privacy | |||||||||||||||||||
99 | AresDB | Data pipeline | Database/Query | Uber | 2019 | https://github.com/uber/aresdb | OSS | A GPU-powered real-time analytics storage and query engine. | ||||||||||||||||||
100 | SQLFlow | Data pipeline | Database/Query | 2019 | https://sql-machine-learning.github.io/ | Extends SQL to support AI. Extract knowledge from Data. Currently support MySQL, Apache Hive, Alibaba MaxCompute, XGBoost and TensorFlow. |