1 of 55

Building InfluxDB 3.0

With Apache Arrow, DataFusion, Flight, Parquet

Andrew Lamb | Staff Engineer, InfluxData

| © Copyright 2024, InfluxData

1

2 of 55

Andrew Lamb

Staff Engineer InfluxData

> 20 21 😱years in enterprise software development

Oracle: Database (2 years)

DataPower: XSLT compiler (2 years)

Vertica: DB / Query Optimizer (6 years)

Nutonian/DataRobot: ML Startups (7 years)

InfluxData: InfluxDB 3.0, Arrow, DataFusion (4 years)

i n f l u x d a t a . c o m

| © Copyright 2024, InfluxData

2

3 of 55

Goals

Convince you, via example, that:

  1. Databases of the future will be assembled from reusable components
  2. This is not a terrible idea

Talk outline:

  • Database Implementation Trends
  • Time Series Databases, need for (yet another) one
  • FDAP: Flight, DataFusion, Arrow, Parquet: Rationale + Use InfluxDB 3.0

| © Copyright 2024, InfluxData

3

4 of 55

Thesis: Long Term Trends in Databases

Time

Adoption

Ease of Building

3.0

| © Copyright 2024, InfluxData

4

5 of 55

“One Size Fits All”

An Idea Whose Time Has Come and Gone

Stonebraker & Çetintemel (2005)

| © Copyright 2024, InfluxData

5

6 of 55

Case Study: In which InfluxData decides to build a new time series database.

| © Copyright 2024, InfluxData

6

7 of 55

What / Why of Time Series Databases

Specialized for storing data with times (obviously).

Key Properties:

  1. High volume, denormalized ingest
    • eg. host=myhost123.example.com repeated over and over
  2. Low latency query after load: milliseconds between ingest and query
  3. Schema on write: new columns can appear at any time, backfills
  4. Rapid data value decay: newest data is super important, falls off drastically

Examples:

Open → InfluxDB, Timescale, Graphite Whisper, VictoriaMetrics

Closed → Facebook Gorilla, Google Monarch, AWS Timestream, DataDog Husky

| © Copyright 2024, InfluxData

7

8 of 55

InfluxDB 3.0 Requirements

  • Need: No series cardinality limits:
    • Tech: TSM (LSM Tree / KV Store) → Column Store
  • Need: ‘Infinite’ Retention
    • Tech: TSM on locally attached disks → object store (cheap!)
  • Need: Elastic Scalability
    • Tech: Shared Nothing (local disk)→ Disaggregated Storage (S3)
  • Need: Ecosystem Compatibility
    • Tech: InfluxQL / custom APIs → SQL + JDBC/ODBC

| © Copyright 2024, InfluxData

8

9 of 55

So, let’s build a new Database

(How hard could that be, really?)

| © Copyright 2024, InfluxData

9

10 of 55

It is hard (expensive) to build a new databases

Database company money raised

  • Snowflake: $2B
  • Databricks (Spark) $3.5B
  • MongoDB: $311M
  • SingleStore: $464.1M
  • CockroachLabs (CockroachDB): $633.1M
  • Pingcap (TiDB): $341.6M
  • Elastic: $162M
  • TimescaleDB: $181M
  • DuckDB $? / MotherDuck: $47.5M $100M

I did this at Vertica too

| © Copyright 2024, InfluxData

10

11 of 55

“We can do it with the Apache Arrow Ecosystem”

Evan (CEO)

Paul (CTO)

| © Copyright 2024, InfluxData

11

12 of 55

Benefits of building on FDAP foundation

⇒ Innovation on Time Series, FDAP provides standard (lengthy to build) pieces

match tool_needed_for_database {

File format (persistence) => Parquet,

Columnar memory representation => Arrow Arrays,

Operations (e.g. multiply, avg) => Compute Kernels,

SQL + extensible query engine => Arrow DataFusion,

Network transfer => Arrow Flight,

JDBC/ODBC driver => Arrow FlightSQL,

}

| © Copyright 2024, InfluxData

12

13 of 55

Let’s do it

| © Copyright 2024, InfluxData

13

14 of 55

InfluxDB 3.0 Architecture

| © Copyright 2024, InfluxData

14

15 of 55

InfluxDB 3.0: Fast, Real Time & Cost Effective

| © Copyright 2024, InfluxData

15

16 of 55

“Community over Code” - The Apache Way

Non profit governance of open source communities

| © Copyright 2024, InfluxData

16

17 of 55

Apache: Benefits for InfluxDB 3.0

  • ⇒ Predictable Foundation
  • Stable License: (ASL 20 years old) low risk of changes, (ahem OpenTofu)
  • Communication: Predictable and open (if slow)
  • Multi-Vendor Participation: Shared investment reduces individual risk
  • Long Term Maintenance: Hedged against life changes, corporate strategy shifts, VC funding cycles
  • ⭐⭐⭐⭐⭐: Works far better than could be reasonably expected

| © Copyright 2024, InfluxData

17

18 of 55

Columnar file format from 2013 (originally part of Hadoop ecosystem)

https://parquet.apache.org/

| © Copyright 2024, InfluxData

18

19 of 55

Apache Parquet: Benefits for InfluxDB

⇒ Avoid Engineering overhead of building a custom file format + ecosystem

Compression: Works well with wide data variety (including time series)

Performance: Techniques such as projection, filter pushdown, late materialization

Interoperability: “Defacto” interchange format for analytics, immediate compatibility with 1000s of tools (Apache Iceberg makes it even easier)

| © Copyright 2024, InfluxData

19

20 of 55

TSM (time series) vs Parquet Compression

Entire dataset (Parquet/TSM: 20%)

Format

File Count

Size (GB)

TSM

483

591

TSM (gzip)

483

97

Parquet

246,140

(4K - 2.2GB)

118

Format

File Count

Size (GB)

TSM

1

.110

Parquet

54

(4K - 7M)

.020

File 1 (Parquet/TSM: 18%)

Format

File Count

Size (GB)

TSM

1

2

Parquet

974

(12K - 53M)

.226

File 2 (Parquet/TSM 11%)

| © Copyright 2024, InfluxData

20

21 of 55

Parquet Organization

A

B

C

15

Foo

1/1/2023

11

Bar

..1/5/2023

50

Baz

1/1/2023

32

Blarg

1/6/2023

(“PAX” in DB literature)

| © Copyright 2024, InfluxData

21

22 of 55

Parquet Structure + Metadata

Metadata footer

Highly encoded / compressed pages

Footer contains location of pages, and statistics such as min/max/count/nullcount.

(“Zone Maps”, “Small Materialized Aggregates” in DB literature)

| © Copyright 2024, InfluxData

22

23 of 55

Parquet Projection + Filter Pushdown

Metadata footer

Highly encoded / compressed pages

Metadata + query to prune (skip) pages that aren’t needed

SELECT A

...

WHERE C > 25

1. Consult metadata

2. Only read/decode necessary pages

| © Copyright 2024, InfluxData

23

24 of 55

How does InfluxDB 3.0 use Parquet?

Durable Store: All data durably persisted to object store as parquet;

Query: Read from parquet files + latest unpersisted ingester data

Object Store

Ingester

2. Periodically writes data buffer as sorted parquet files to object store

Ingester

write path

read path

Querier

4. Reads (and caches) parquet data from object store to answers queries

SELECT … FROM ...

3. User query

1. Incoming line protocol

weather,location=us-east temperature=82,humidity=67 1465839830100400200�weather,location=us-midwest temperature=82,humidity=65 1465839830100400200�weather,location=us-west temperature=70,humidity=54 1465839830100400200�weather,location=us-east temperature=83,humidity=69 1465839830200400200�weather,location=us-midwest temperature=87,humidity=78 1465839830200400200�weather,location=us-west temperature=72,humidity=56 1465839830200400200�weather,location=us-east temperature=84,humidity=67 1465839830300400200�weather,location=us-midwest temperature=90,humidity=82 1465839830400400200�weather,location=us-west temperature=71,humidity=57 1465839830400400200

weather,location=us-east temperature=82,humidity=67 1465839830100400200�weather,location=us-midwest temperature=82,humidity=65 1465839830100400200�weather,location=us-west temperature=70,humidity=54 1465839830100400200�weather,location=us-east temperature=83,humidity=69 1465839830200400200�weather,location=us-midwest temperature=87,humidity=78 1465839830200400200�weather,location=us-west temperature=72,humidity=56 1465839830200400200�weather,location=us-east temperature=84,humidity=67 1465839830300400200�weather,location=us-midwest temperature=90,humidity=82 1465839830400400200�weather,location=us-west temperature=71,humidity=57 1465839830400400200

| © Copyright 2024, InfluxData

24

25 of 55

In memory format for fast vectorized processing

| © Copyright 2024, InfluxData

25

26 of 55

Arrow: Benefits for InfluxDB

⇒ Best practice for storing columnar data in memory

Type System: Full type support (integers, strings, timestamps, etc)

Null Support: Standard null bitmask representation + semantics

Efficient Encodings: Dictionary encoding for Strings

Natural Integration: DataFusion, parquet libraries, Arrow Flight

Optimized Compute Kernels: Fast vectorized kernels are well understood, but time consuming to implement, test and maintain

Note: Arrow has (many) more features than this

| © Copyright 2024, InfluxData

26

27 of 55

Arrow Array: Int64Array

Pretty much what you will find in every vectorized column store engine

15

1743

NULL

432

..

9

322

8

Logically 1024

8 byte integers

Buffer (aligned)

8192 bytes

15

0

Byte Offset

1743

??

432

9

322

8

8

16

24

8168

8176

8184

Arrow Array

1

1

1

0

1

1

1

Validity (bitmask)

128 bytes

Bit Offset

0

1

2

3

127

126

125

| © Copyright 2024, InfluxData

27

28 of 55

Compute Kernels

let output = gt(

&left,

&right

);

+

10

20

17

5

23

5

9

12

4

5

76

2

3

5

2

33

2

1

6

7

8

2

7

2

5

6

7

8

left

right

output

1

0

1

1

1

0

1

1

0

1

1

0

0

0

>

>

>

>

The gt (greater than) kernel computes an output BooleanArray where each element is left > right

Kernels handle nulls (validity masks), optimizations for different data sizes, etc.

~50 different kernels, full list: docs.rs page

| © Copyright 2024, InfluxData

28

29 of 55

How does InfluxDB 3.0 use Arrow: Ingester Buffers

Indirectly: via DataFusion and Flight.

Directly: Ingest path, which parses input line protocol → Arrow Arrays

Ingester

4. If queried prior to writing parquet buffer is snapshotted, turned into RecordBatch sent to querier via ArrowFlight

Querier

2. LP is parsed and appended to an in memory buffer (mutable batch)

open buffer

3. Periodically writes sorted data on object store using datafusion plan

Snapshots

(Record Batches)

1. Incoming Line Protocol

weather,location=us-east temperature=82,humidity=67 1465839830100400200�weather,location=us-midwest temperature=82,humidity=65 1465839830100400200�weather,location=us-west temperature=70,humidity=54 1465839830100400200�weather,location=us-east temperature=83,humidity=69 1465839830200400200�weather,location=us-midwest temperature=87,humidity=78 1465839830200400200�weather,location=us-west temperature=72,humidity=56 1465839830200400200�weather,location=us-east temperature=84,humidity=67 1465839830300400200�weather,location=us-midwest temperature=90,humidity=82 1465839830400400200�weather,location=us-west temperature=71,humidity=57 1465839830400400200

Object Store

| © Copyright 2024, InfluxData

29

30 of 55

Highly customizable, fast query engine that uses Arrow

| © Copyright 2024, InfluxData

30

31 of 55

Arrow DataFusion: Benefits for InfluxDB

⇒ Fast full featured, extensible query engine

All the OLAP buzzwords: vectorized, columnar, multi-core, streaming, out of core, …

“Full” SQL: JOINs, date/time/timestamp functions, structured data, …

Customizable: Easily extend time series specific functions (e.g. date_bin gapfill, InfluxQL, …)

| © Copyright 2024, InfluxData

31

32 of 55

DataFusion: Input / Output

Data Batches

SQL Query

SELECT status, COUNT(1)

FROM http_api_requests_total

WHERE path = '/api/v2/write'

GROUP BY status;

Data Batches

DataFrame

ctx.read_table("http")?

.filter(...)?

.aggregate(..)?;

Catalog information:

tables, schemas, etc

| © Copyright 2024, InfluxData

32

33 of 55

DataFusion: Totally Customizable Architecture

SQL

Query FrontEnds

DataFrame

LogicalPlans

ExecutionPlan

Plan Representations

(DataFlow Graphs)

Expression Eval

Optimizations / Transformations

Optimizations / Transformations

HashAggregate

Sort

Join

Parquet

CSV

DataFusion

Extend ✅

Extend ✅

Extend ✅

Extend ✅

Extend ✅

Extend ✅

Extend ✅

Extend ✅

Optimized Execution Operators

(Arrow Based)

Catalog / Data Source

| © Copyright 2024, InfluxData

33

34 of 55

How InfluxDB 3.0 uses DataFusion

Optimization

(storage pruning, pushdown, etc)

Physical Planning

Execution

DataFusion

LogicalPlan

DataFusion

ExecutionPlan

DataFusion

Streams

gRPC output

Arrow Flight

Client Specific

Output formats

Arrow

Record Batches

ParquetWriter

SeriesFrame

...

FlightData

Write to Parquet files

Storage gRPC Frontend

SQL Frontend

(from DataFusion)

Client / Language Specific Frontends

read_group(..)

SELECT …

FROM

Reorg Frontend

compact_plan(..)

InfluxQL Frontend

SELECT … FROM … GROUP BY *

  • All queries (3 different query languages)
  • Parquet creation + compaction

| © Copyright 2024, InfluxData

34

35 of 55

Flight

Efficiently send columnar data (as Arrow Arrays) over the network

| © Copyright 2024, InfluxData

35

36 of 55

Arrow Flight: Benefits for InfluxDB

⇒ Efficient network transfer; wide client support

Network Efficient: columnar format, high bandwidth transfer

CPU Efficient: “zero copy” Serialization / Deserialization

Pre Existing Clients: Minimal effort to make InfluxDB 3.0 clients (just use Arrow Flight clients)

| © Copyright 2024, InfluxData

36

37 of 55

How does InfluxDB 3.0 use Flight? Native Query Protocol

  1. Client ←→ Querier

Querier

write path

read path

SELECT … FROM ...

1. User query via Flight

Ingester

2. Querier combines data from ingesters and parquet into response

3. Response via Flight

| © Copyright 2024, InfluxData

37

38 of 55

How does InfluxDB 3.0 use Flight?

Ingester ←→ Querier

  • Ingester and Querier’s internal protocol is built on Flight*

Ingester

Querier

SELECT … FROM ...

write path

read path

1. User query

2. The querier fetches unpersisted (not yet written to parquet) data returned from the ingester, as RecordBatches via Flight

* we are working on in an optimized flight-like protocol to reduce latency

| © Copyright 2024, InfluxData

38

39 of 55

FlightSQL

Execute SQL queries and return results with a standard (non custom) client

| © Copyright 2024, InfluxData

39

40 of 55

Arrow FlightSQL: Benefits for InfluxDB

⇒ Access to SQL ecosystem, without implementing our own drivers/connectors

Prepackaged Client Libraries: JDBC/ODBC/Ecosystem drivers

Pre-packaged Integrations: Many systems already read from FlightSQL

Community Leverage: E.g. FlightSQL Grafana plugin, others contribute

| © Copyright 2024, InfluxData

40

41 of 55

Apache Arrow FlightSQL

  • Send SQL queries, receive Responses as Arrow Arrays
  • Has clients in many languages / APIs (JDBC, python DB API, etc)

| © Copyright 2024, InfluxData

41

42 of 55

How InfluxDB 3.0 uses FlightSQL

✅ Access to the ecosystem without having to implement JDBC, ODBC, … 😅

client

arrow-flight JDBC

Driver

InfluxDB 3.0

Data is sent via Arrow FlightSQL

(columnar)

arrow-flight JDBC driver implements JDBC API

JDBC Interface

| © Copyright 2024, InfluxData

42

43 of 55

Before Flight SQL

Server

Database specific protocol (e.g. Postgres FEBE)

Caveat: Only show JDBC and ODBC drivers. Also common is a native implementation in each language ecosystem - e.g like the Python DB API 20 and psycopg2 etc. See Apache Arrow Database Connectivity (ADBC) for more details

client

JDBC

Driver

JDBC Interface

client

JDBC

Driver

JDBC Interface

client

ODBC

Driver

ODBC Interface

| © Copyright 2024, InfluxData

43

44 of 55

Alternate Strategy: Use Posgres FEBE

(Slow + painful 🤮)

client

Server

Client: Prepare query:"SELECT * FROM foo WHERE a = $1;" name:""

Client: Bind name:"" parameters:[{format:"text", value:"42"}]

… (steps elided) …

Server: RowDescription fields:[{name:"a", type:"int", format:"text"}, …]

Server: DataRow fields:[{data:"42"}, {data: "Hunter Valley"} …]

Server: DataRow fields:[{data:"12"}, {data: "Merrimack Valley"} …]

… (lots ros for each data)

Server: DataRow fields:[{data:"321"}, {data: "Charles River Watershed"} …]

… (steps elided) …

JDBC Interface

3. Data is sent row by row

Postgres

JDBC

Driver

DataRow {..}

DataRow {..}

DataRow {..}

DataRow {..}

1. Data is produced column by column in RecordBatches

PG febe adapter

2. Adapter converts to the postgres FEBE protocol

It also turns out clients using the postgres driver tend to try and query the postgres metadata tables 😱

| © Copyright 2024, InfluxData

44

45 of 55

Conclusion

| © Copyright 2024, InfluxData

45

46 of 55

Conclusion

  • Building Databases from scratch is hard (and expensive 💸)
  • You don’t have to anymore 🎉
  • We built InfluxDB 3.0 using Apache Flight, DataFusion, Arrow, and Parquet, and it was awesome
  • ⭐⭐⭐⭐⭐: Highly recommended for your next projects

| © Copyright 2024, InfluxData

46

47 of 55

Related Work

The Composable Data Management System Manifesto (VLDB 2023)

Velox: Meta’s Unified Execution Engine

(VLDB 2022)

The Modern Data Architecture: The Deconstructed Database (USENIX ;login: Winter 2018)

A Deep Dive into Common Open Formats for Analytical DBMSs (VLDB 2023)

Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine

(To Appear SIGMOD 2024)

| © Copyright 2024, InfluxData

47

48 of 55

Try It Yourself

https://www.influxdata.com

https://github.com/InfluxCommunity

| © Copyright 2024, InfluxData

48

49 of 55

T H A N K Y O U

| © Copyright 2024, InfluxData

49

50 of 55

Backup Slides

| © Copyright 2024, InfluxData

50

51 of 55

Thank you!

Questions / Discussion

Find out more:

| © Copyright 2024, InfluxData

51

52 of 55

Defragmenting Data Access Across Systems

| © Copyright 2024, InfluxData

52

53 of 55

Integration: Arrow Language Implementations

53

Java

C++

JavaScript

Go

Rust

C

Ruby

Python

Julia

R

C#

Matlab

| © Copyright 2024, InfluxData

53

54 of 55

Why Arrow Internally (and not just at interface)?

Option 1: Use Arrow Internally (DataFusion, pola.rs, Acero)

Pros: Fast interchange, reuse Arrow libraries

Cons: Constrained(*) to Arrow

Option 2: Use specialized structures internally, convert to Arrow at edges (Velox, DuckDB)

Pros: Can use specialized structures

Cons: Maintain specialized code

Scan

Filter

Agg

Scan

Filter

?

Agg

?

?

Convert to arrow

| © Copyright 2024, InfluxData

54

55 of 55

Why Arrow Internally (and not just at interface)?

So far results are encouraging

Theory: Using Arrow is “good enough” compared to specialized structures

Pooled open source development → invest heavily in optimized Arrow kernels

Good: Sorting, Filtering, Projection, Parquet

Could Improve: Grouping, Joining

| © Copyright 2024, InfluxData

55