1 of 25

Why Rent When You Can Own?

Build your modern data lakehouse with true optionality

2 of 25

What’s The Problem With Today’s Architecture?

01

3 of 25

The Data Warehouse Paradigm Creates Vendor Lock-In

Your data is locked into a proprietary database

4 of 25

Why Data Lakehouse

02

5 of 25

Lakehouse = Data Warehouse Without Vendor Lock-in, With Best-Of-Breed Tools

  • Data warehouse features coming to the data lake
  • All use cases in one place (low latency covering PB of data)
  • Diversity of data
  • Variety of tools and engines interacting with the data
  • High performance, fully replicated object storage

6 of 25

What An Open Lakehouse Looks Like

  • Ingestion of many different types of data in real-time or batch
  • Layers from raw to fully aggregated
  • Columnar file formats along with popular table formats
  • Multi-engine
  • Variety of end users from traditional BI to data science

7 of 25

Lakehouse Offers More Functionality Without Compromise

Feature

Lakehouse

Data Warehouse

Interactive queries

Yes

Yes

Manipulation of data (DML)

Yes

Yes

Petabytes of data

Yes

No

Indexing and caching to speed up queries

Yes, with Starburst+Verada

Yes

Ability to use the best engine for your use case, not locked into a vendors’ ecosystem

Yes

No

Optionality to switch to open source

Yes, with Starburst/Trino

No

Active data warehousing

No

Yes

8 of 25

Get The Benefits Of Today And Years To Come With The Lakehouse

  • “Innovate or we’re dead.” This doesn’t just apply to software vendors
  • “Sticky” can be good or bad, you decide
  • Data is the energy of your company (not oil)
  • KISS - Data mesh, decentralized, single-truth all apply
  • Data Warehouses lock you in but slowly go out of fashion - don’t get caught

9 of 25

Why The Starburst Approach To The Lakehouse

  • Data is in your account under your control, in your account
  • Many engines / solutions can interact with your data
  • Many use cases are supporting including data science and traditional BI reporting
  • No vendor lock-in (use the best engine for the job)
  • Enhance your data lake with other sources via data mesh architecture (query federation)

10 of 25

How To Build The Data Lakehouse

03

11 of 25

Lakehouse Architecture

  • Variety of data sources are ingested
  • Object storage is primary destination
  • Table formats bring database functionality
  • Many engines to choose from (Spark, Trino,etc..)
  • 100s of access tools (BI, Data Science, SQL,etc..)

12 of 25

How It Looks Like With Starburst Galaxy

13 of 25

Operate

Your Data Lakehouse with Starburst

04

14 of 25

Use Case: Data Lakehouse Engine

Deploy to any environment. Also supports HDFS, cloud storage and S3 compatible (Dell ECS, Minio,etc..)

High concurrency, auto-scaling MPP engine (Trino), which is widely used in industry (replaced Hive)

Full role based access control

15 of 25

Use Case: Data Lakehouse w/ Data Mesh

SELECT

c.orderkey,

o.shippriority

FROM

teradata.tpch.customer c, sql_server.tpch.orders o

Query over 35 data sources using standard ANSI SQL

Starburst engine provides really fast speeds via file indexing, caching, cost-based optimizer, dynamic filtering and join pushdown, and more

16 of 25

Use Case: Data Processing Engine

  • ELT processing engine (now with fault tolerance!)
  • Raw/Stage - 50+ connectors to extract data or create views
  • Join, curate and enhance data on any data storage w/ standard sql
  • Data pipelines accessible to everyone who knows SQL, not bottlenecked by data engineers (TalkDesk)
  • Learn how Zillow, Lyft, use the Starburst/Trino for data processing at Trino Summit on Nov 10 (link)

17 of 25

What Makes Starburst (Trino) A Versatile Engine

05

18 of 25

Fast And Cost-Efficient

  • Compared to Open Source Spark, Starburst executes queries is 38% faster

*Test run on TPC-H 10TB data schema using 5 m5.8xlarge machines

19 of 25

Ability To Run Trino on Spots For Cost Savings

  • Running on spot instances is desirable because often compute cost is often 50% cheaper
  • Trino enables really great resiliency over spots because external exchange buffer service makes Trino more resilient over spots
  • Latency of losing nodes is half of what it is in Spark
  • Case study: 60% cost savings at BlueCat

20 of 25

Trino Is Fast And Predictable On Spots

Trino query execution time on spot instances is faster than Spark on-demand instances

21 of 25

Starburst Galaxy+dbt Demo

06

22 of 25

Demo - Building Pipeline and Consumption

23 of 25

Starburst Galaxy Provides Great Ecosystem For Trino

Ecosystem of connectors

Performance and flexibility

Scalability

Ease of use / consumability

Security and compliance

Optionality

24 of 25

Ease of use and consumability

Capabilities that enable easy discovery and consumption of high-quality data

Easy to connect to a rich ecosystem of data sources, BI tools, partner products

Intuitive user experience using the SQL skills and tools you already know

Fully managed SaaS option

Resource elasticity: reduces need for dedicated operational team

Flexible and transparent licensing, pricing, and billing options

25 of 25