ABCDEFGHIJKLMNOPQRSTUVWXY
1
Zastosowanie / Cel biznesowySelf manage (on premise or public cloud)Amazon Web ServicesGoogle Cloud PlatformMicrosoft AzureThird Party Cloud offering
2
3
Data Lake (składowanie petabajtów danych jako pliki lub obiekty)- Hadoop Distributed File System (HDFS)
- Apache Ozone (S3 compatible, based on HDFS)
- MinIO (S3 compatible)
- Ceph (S3 compatible)
- Alluxio (formerly known as Tachyon, virtual distributed storage system, HDFS, S3, GCS, ABS and other)
- Amazon Simple Storage Service (S3) (HDFS compatible)- Google Cloud Storage (HDFS compatible)- Azure Data Lake (gen 2) (HDFS compatible)
4
Data Catalog, Data Governance, Data lineage etc.- Hive Metastore
- Apache Atlas
- Amundsen
- Marquez
- Datahub
- OpenMetadata
- EventCatalog
- Amazon Glue Data Catalog- Google Data Catalog- Azure Data Catalog
5
Data LakeHouse- HDFS + Apache Iceberg (made by Netflix),
- HDFS + Apache Hudi (made by Uber),
- HDFS + Databricks Delta Lake
(to co w on premise na S3)(to co w on premise na GCS)(to co w on premise na ADL)- Databricks SQL Analytics (preview)
6
Data Warehouse- IBM DB2 Warehouse / IBM Netezza,
- Teradata,
- Oracle Autonomouse Database / Oracle Exadata,
- Vertica
- ClickHouse
- Grenplum
- Amazon Redshift- Google BigQuery- Azure Synapse (wcześniej Azure SQL Data Warehouse)
- Microsoft Fabric SQL
- Snowflake (AWS/GCP/Azure)
- Databricks (AWS/GCP/Azure)
- Firebolt (AWS only)
7
Data Lake + Data Warehouse integration- Apache Hive/Apache Spark/etc,
- Oracle Big Data SQL,
- IBM Db2 Big SQL
- Redshift Spectrum- Google BigQuery (external tables)- Azure Synapse (Spark SQL)
8
Big Data Platforms (Hadoop and Spark)- Hortonworks Data Platform (HDP) [legacy],
- Cloudera Distribution for Hadoop (CDH) [legacy]
- Cloudera Data Platform,
- HPE Ezmeral (previous MapR),
- Apache Bigtop
- AWS EMR- Google Dataproc- Azure HDInsight (based on Hortonworks Data Platform) [legacy, killed by Cloudera]- Databricks Unified Data Analytics Platform (AWS/GCP/Azure),
- Cloudera Data Platform Cloud (AWS/Azure)
9
SQL on DataLake- Apache Hive,
- Apache Spark SQL,
- Presto (PrestoDB, Facebook),
- Trino (PrestoSQL) / Starburst Enterprise,
- Apache Drill,
- Cloudera Impala,
- Apache Pig,
- Dremio
- AWS Athena (Serverless, based on Presto)- Google BigQuery (external tables)- Azure Synapse (Spark SQL)
- MS Fabric
- Databricks SQL Analytics
- Ahana Cloud (Managed Presto on AWS)
- Starburst Galaxy (Managed Trino)
10
SQL relational databases- PostgreSQL,
- MySQL,
- Oracle Database,
- MS SQL Server
Amazon Relational Database Service (RDS) + Amazon AuroraGoogle Cloud SQLAzure SQL Database
Azure Database
11
SQL Distributed Databases- CockroachDB- Amazon Aurora global databases- Google Cloud Spanner- Azure Cosmos DB (SQL API)
12
NoSQL Databases- Apache HBase (Hadoop Database),
- Apache Cassandra / Scylla,
- Accumulo
- Amazon DynamoDB,
- Amazon Keyspaces (Cassandra as Service),
- Google Bigtable (HBase API)- Azure CosmosDB (Cassandra API)
- Azure Storage Tables
- DataStax Astra (AWS/GCP/Azure)
13
NoSQL Document Database- MongoDB- Amazon DocumentDB- Google Cloud Datastore- Azure DocumentDB,
- Interfejs API Azure Cosmos DB dla bazy danych MongoDB
- MongoDB Cloud (AWS/GCP/Azure)
14
NoSQL Grapsh Database- Neo4j- Amazon Neptune
15
NoSQL Cache- Redis,
- Memcache
- Amazon ElastiCache (Redis or Memcached)- Google Cloud Memorystore- Azure Redis Cache
16
NoSQL search engine- Elastic Stack (Elasticsearch, Kibana, Logstash, etc),
- Apache Solr (available in big data distribution)
- Amzon CloudSearch
- Amazon Elasticsearch Service (ES)
- Azure Search- Elastic Cloud (AWS/GCP/Azure)
17
Brocker- RabbitMQ
- ActiveMQ
- ZeroMQ
- Amazon MQ,
- Amazon Simple Queue Service (SQS),
- Amazon Simple Notification Service (SNS)
- Google Cloud Pub/Sub- Azure Service Bus + Azure Queue Storage,
- Azure Notification Hubs
18
Streaming - Brocker- Apache Kafka,
- Apache Pulsar
- Amazon Kinesis Data Streams- Google Pub/Sub,- Azure Event Hubs- Confluent Cloud
- Aiven
- Cloudera Cloud
19
Stream processing- Apache Spark [Structured] Streaming,
- Apache Flink,
- Apache Beam,
- Kafka Streams,
- Apache Storm,
- Apache Heron (made by Twitter)
- Amazon Kinesis Data Analytics (based on Apache Flink)- Google Dataflow (Apache Beam)- Azure Stream Analytics- Ververica Cloud (Kafka Streams, ksqlDB, Flink)
- Confluent Cloud (Flink)
- Aiven (Flink)
- Decodable (Flink SQL)
20
Streaming platform- Confluent Platform (based on Apache Kafka),
- Hortonworks/Cloudera DataFlow (based on Apache Kafka and NiFi)
- Ververica Platform (based on Apache Flink)
- Amazon Managed Streaming for Apache Kafka (Amazon MSK),
- Amazon Kinesis (multiple tools inside)
- Google Dataproc (Big Data platform with Apache Kafka)- Azure HDInsight (based on Hortonworks DataFlow)- Confluent Cloud (AWS/GCP/Azure)
- Aiven
- Ververica Cloud
- Cloudera Data Platform Cloud
21
Real time data transformation- Kafka Connect,
- Apache NiFi + MiNiFi,
- Apache Flume
- Amazon Kinesis Data Firehose- Google Dataflow (Apache Beam)- Azure HDInsight (NiFi),
- Azure DataFactory
22
Batch data transformation (ETL, ELT)- Apache Beam,
- Apache Spark,
- dbt
- Airbyte
- Meltano
- Apache Hop,
- Twister2,
- Apache Samza (made by LinkedIN)
- Amazon Glue (serverless Apache Spark)
- AWS Data Pipeline
- Google Dataflow (Apache Beam)
- Cloud DataPrep (created by Trifacta)
- Azure DataFactory (with support for Databricks/Spark or whole HDInsight platform)
23
Integration between Relational Database and Data Lake- Apache Sqoop,
- Apache Spark
- AWS Database Migration Service (AWS DMS)- Azure Database Migration Service
24
Change Data Capture- Debezium + Kafka- AWS Database Migration Service (AWS DMS),
- Debezium + Kinesis
- Debezium + Pub/Sub
25
Task Orchestration- Apache Airflow,
- Dagster,
- Prefect,
- Mage AI,
- Apache Oozie (big data distro),
- Luigi (Spotify),
- Azkaban
- AWS Step Functions
- AWS Data Pipeline,
- Amazon Managed Workflows for Apache Airflow (MWAA)
- Amazon Simple Workflow Service
- Google Cloud Composer (Apache Airflow)- Azure Data Factory- Astronomer (Managed Airflow)
26
Machine Learning and/or Data Science Platform- Anaconda
- Dataiku
- H2O.ai
- Amazon SageMaker- Google Cloud AutoML,
- Google Cloud Machine Learning Engine,
- Google Cloud Datalab
- Azure Machine Learning,
- Azure Machine Learning Studio
- Databricks Unified Analytics Platform
- Alteryx
27
Data Science Notebooks- Jupyter Family,
- BeakerX,
- Apache Zeppelin,
- Polynote (made by Netflix)
- AWS EMR Notebooks
- AWS SageMaker Notebooks
- Google Colaboratory (Colab)Azure Notebooks (Killed by MS)
28
Data visualization- Kibana (Elastic Stack)
- Apache Superset
- Redash (by Databricks)
- Metabase
- Tableau
- Qlik
- Microsoft Power BI (on premise edition)
- AWS QuickSight- Google Data Studio
- Google Looker
- Microsoft Power BI- Tableau Cloud
29
Production Ready ML services(commercial offering by big tech vendors)- AWS ML Services- Google Cloud AI Building Blocks- Azure Cognitive Services
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100