Building InfluxDB 3.0
With Apache Arrow, DataFusion, Flight, Parquet
Andrew Lamb | Staff Engineer, InfluxData
| © Copyright 2024, InfluxData
1
Andrew Lamb
Staff Engineer InfluxData
> 20 21 😱years in enterprise software development
Oracle: Database (2 years)
DataPower: XSLT compiler (2 years)
Vertica: DB / Query Optimizer (6 years)
Nutonian/DataRobot: ML Startups (7 years)
InfluxData: InfluxDB 3.0, Arrow, DataFusion (4 years)
i n f l u x d a t a . c o m
| © Copyright 2024, InfluxData
2
Goals
Convince you, via example, that:
Talk outline:
| © Copyright 2024, InfluxData
3
Thesis: Long Term Trends in Databases
Time
Adoption
Ease of Building
3.0
| © Copyright 2024, InfluxData
4
“One Size Fits All”
An Idea Whose Time Has Come and Gone
Stonebraker & Çetintemel (2005)
| © Copyright 2024, InfluxData
5
Case Study: In which InfluxData decides to build a new time series database.
| © Copyright 2024, InfluxData
6
What / Why of Time Series Databases
Specialized for storing data with times (obviously).
Key Properties:
Examples:
Open → InfluxDB, Timescale, Graphite Whisper, VictoriaMetrics
Closed → Facebook Gorilla, Google Monarch, AWS Timestream, DataDog Husky
| © Copyright 2024, InfluxData
7
InfluxDB 3.0 Requirements
| © Copyright 2024, InfluxData
8
So, let’s build a new Database
(How hard could that be, really?)
| © Copyright 2024, InfluxData
9
It is hard (expensive) to build a new databases
Database company money raised
Source: https://www.crunchbase.com
I did this at Vertica too
| © Copyright 2024, InfluxData
10
“We can do it with the Apache Arrow Ecosystem”
Evan (CEO)
Paul (CTO)
| © Copyright 2024, InfluxData
11
Benefits of building on FDAP foundation
⇒ Innovation on Time Series, FDAP provides standard (lengthy to build) pieces
match tool_needed_for_database {
File format (persistence) => Parquet,
Columnar memory representation => Arrow Arrays,
Operations (e.g. multiply, avg) => Compute Kernels,
SQL + extensible query engine => Arrow DataFusion,
Network transfer => Arrow Flight,
JDBC/ODBC driver => Arrow FlightSQL,
}
| © Copyright 2024, InfluxData
12
Let’s do it
| © Copyright 2024, InfluxData
13
InfluxDB 3.0 Architecture
| © Copyright 2024, InfluxData
14
InfluxDB 3.0: Fast, Real Time & Cost Effective
| © Copyright 2024, InfluxData
15
“Community over Code” - The Apache Way
Non profit governance of open source communities
| © Copyright 2024, InfluxData
16
Apache: Benefits for InfluxDB 3.0
| © Copyright 2024, InfluxData
17
| © Copyright 2024, InfluxData
18
Apache Parquet: Benefits for InfluxDB
⇒ Avoid Engineering overhead of building a custom file format + ecosystem
Compression: Works well with wide data variety (including time series)
Performance: Techniques such as projection, filter pushdown, late materialization
Interoperability: “Defacto” interchange format for analytics, immediate compatibility with 1000s of tools (Apache Iceberg makes it even easier)
| © Copyright 2024, InfluxData
19
TSM (time series) vs Parquet Compression
Entire dataset (Parquet/TSM: 20%)
Format | File Count | Size (GB) |
TSM | 483 | 591 |
TSM (gzip) | 483 | 97 |
Parquet | 246,140 (4K - 2.2GB) | 118 |
Format | File Count | Size (GB) |
TSM | 1 | .110 |
Parquet | 54 (4K - 7M) | .020 |
File 1 (Parquet/TSM: 18%)
Format | File Count | Size (GB) |
TSM | 1 | 2 |
Parquet | 974 (12K - 53M) | .226 |
File 2 (Parquet/TSM 11%)
| © Copyright 2024, InfluxData
20
Parquet Organization
A | B | C |
15 | Foo | 1/1/2023 |
… | ||
11 | Bar | ..1/5/2023 |
50 | Baz | 1/1/2023 |
… | ||
32 | Blarg | 1/6/2023 |
(“PAX” in DB literature)
| © Copyright 2024, InfluxData
21
Parquet Structure + Metadata
…
…
…
Metadata footer
Highly encoded / compressed pages
Footer contains location of pages, and statistics such as min/max/count/nullcount.
(“Zone Maps”, “Small Materialized Aggregates” in DB literature)
| © Copyright 2024, InfluxData
22
Parquet Projection + Filter Pushdown
…
…
…
Metadata footer
Highly encoded / compressed pages
Metadata + query to prune (skip) pages that aren’t needed
SELECT A
...
WHERE C > 25
1. Consult metadata
2. Only read/decode necessary pages
| © Copyright 2024, InfluxData
23
How does InfluxDB 3.0 use Parquet?
Durable Store: All data durably persisted to object store as parquet;
Query: Read from parquet files + latest unpersisted ingester data
Object Store
Ingester
2. Periodically writes data buffer as sorted parquet files to object store
Ingester
…
write path
read path
Querier
4. Reads (and caches) parquet data from object store to answers queries
SELECT … FROM ...
3. User query
1. Incoming line protocol
weather,location=us-east temperature=82,humidity=67 1465839830100400200�weather,location=us-midwest temperature=82,humidity=65 1465839830100400200�weather,location=us-west temperature=70,humidity=54 1465839830100400200�weather,location=us-east temperature=83,humidity=69 1465839830200400200�weather,location=us-midwest temperature=87,humidity=78 1465839830200400200�weather,location=us-west temperature=72,humidity=56 1465839830200400200�weather,location=us-east temperature=84,humidity=67 1465839830300400200�weather,location=us-midwest temperature=90,humidity=82 1465839830400400200�weather,location=us-west temperature=71,humidity=57 1465839830400400200
weather,location=us-east temperature=82,humidity=67 1465839830100400200�weather,location=us-midwest temperature=82,humidity=65 1465839830100400200�weather,location=us-west temperature=70,humidity=54 1465839830100400200�weather,location=us-east temperature=83,humidity=69 1465839830200400200�weather,location=us-midwest temperature=87,humidity=78 1465839830200400200�weather,location=us-west temperature=72,humidity=56 1465839830200400200�weather,location=us-east temperature=84,humidity=67 1465839830300400200�weather,location=us-midwest temperature=90,humidity=82 1465839830400400200�weather,location=us-west temperature=71,humidity=57 1465839830400400200
| © Copyright 2024, InfluxData
24
In memory format for fast vectorized processing
| © Copyright 2024, InfluxData
25
Arrow: Benefits for InfluxDB
⇒ Best practice for storing columnar data in memory
Type System: Full type support (integers, strings, timestamps, etc)
Null Support: Standard null bitmask representation + semantics
Efficient Encodings: Dictionary encoding for Strings
Natural Integration: DataFusion, parquet libraries, Arrow Flight
Optimized Compute Kernels: Fast vectorized kernels are well understood, but time consuming to implement, test and maintain
Note: Arrow has (many) more features than this
| © Copyright 2024, InfluxData
26
Arrow Array: Int64Array
Pretty much what you will find in every vectorized column store engine
15 |
1743 |
NULL |
432 |
.. |
9 |
322 |
8 |
Logically 1024
8 byte integers
Buffer (aligned)
8192 bytes
15
0
Byte Offset
1743
??
432
9
322
8
…
8
16
24
8168
8176
8184
…
Arrow Array
1
1
1
0
1
1
1
Validity (bitmask)
128 bytes
Bit Offset
…
0
1
2
3
127
126
125
| © Copyright 2024, InfluxData
27
Compute Kernels
let output = gt(
&left,
&right
);
+
10
20
17
5
23
5
9
12
4
5
76
2
3
5
2
33
2
1
6
7
8
2
7
2
5
6
7
8
left
right
output
1
0
1
1
1
0
1
1
0
1
1
0
0
0
>
>
>
>
The gt (greater than) kernel computes an output BooleanArray where each element is left > right
Kernels handle nulls (validity masks), optimizations for different data sizes, etc.
~50 different kernels, full list: docs.rs page
| © Copyright 2024, InfluxData
28
How does InfluxDB 3.0 use Arrow: Ingester Buffers
Indirectly: via DataFusion and Flight.
Directly: Ingest path, which parses input line protocol → Arrow Arrays
Ingester
4. If queried prior to writing parquet buffer is snapshotted, turned into RecordBatch sent to querier via ArrowFlight
Querier
2. LP is parsed and appended to an in memory buffer (mutable batch)
open buffer
3. Periodically writes sorted data on object store using datafusion plan
Snapshots
(Record Batches)
1. Incoming Line Protocol
weather,location=us-east temperature=82,humidity=67 1465839830100400200�weather,location=us-midwest temperature=82,humidity=65 1465839830100400200�weather,location=us-west temperature=70,humidity=54 1465839830100400200�weather,location=us-east temperature=83,humidity=69 1465839830200400200�weather,location=us-midwest temperature=87,humidity=78 1465839830200400200�weather,location=us-west temperature=72,humidity=56 1465839830200400200�weather,location=us-east temperature=84,humidity=67 1465839830300400200�weather,location=us-midwest temperature=90,humidity=82 1465839830400400200�weather,location=us-west temperature=71,humidity=57 1465839830400400200
Object Store
| © Copyright 2024, InfluxData
29
Highly customizable, fast query engine that uses Arrow
| © Copyright 2024, InfluxData
30
Arrow DataFusion: Benefits for InfluxDB
⇒ Fast full featured, extensible query engine
All the OLAP buzzwords: vectorized, columnar, multi-core, streaming, out of core, …
“Full” SQL: JOINs, date/time/timestamp functions, structured data, …
Customizable: Easily extend time series specific functions (e.g. date_bin gapfill, InfluxQL, …)
| © Copyright 2024, InfluxData
31
DataFusion: Input / Output
Data Batches
SQL Query
SELECT status, COUNT(1)
FROM http_api_requests_total
WHERE path = '/api/v2/write'
GROUP BY status;
Data Batches
DataFrame
ctx.read_table("http")?
.filter(...)?
.aggregate(..)?;
Catalog information:
tables, schemas, etc
| © Copyright 2024, InfluxData
32
DataFusion: Totally Customizable Architecture
SQL
Query FrontEnds
DataFrame
LogicalPlans
ExecutionPlan
Plan Representations
(DataFlow Graphs)
Expression Eval
Optimizations / Transformations
Optimizations / Transformations
HashAggregate
Sort
…
Join
Parquet
CSV
DataFusion
Extend ✅
Extend ✅
Extend ✅
Extend ✅
Extend ✅
Extend ✅
Extend ✅
Extend ✅
Optimized Execution Operators
(Arrow Based)
Catalog / Data Source
| © Copyright 2024, InfluxData
33
How InfluxDB 3.0 uses DataFusion
Optimization
(storage pruning, pushdown, etc)
Physical Planning
Execution
DataFusion
LogicalPlan
DataFusion
ExecutionPlan
DataFusion
Streams
gRPC output
Arrow Flight
Client Specific
Output formats
Arrow
Record Batches
ParquetWriter
SeriesFrame
...
FlightData
Write to Parquet files
Storage gRPC Frontend
SQL Frontend
(from DataFusion)
Client / Language Specific Frontends
read_group(..)
SELECT …
FROM …
Reorg Frontend
compact_plan(..)
InfluxQL Frontend
SELECT … FROM … GROUP BY *
| © Copyright 2024, InfluxData
34
Flight
Efficiently send columnar data (as Arrow Arrays) over the network
| © Copyright 2024, InfluxData
35
Arrow Flight: Benefits for InfluxDB
⇒ Efficient network transfer; wide client support
Network Efficient: columnar format, high bandwidth transfer
CPU Efficient: “zero copy” Serialization / Deserialization
Pre Existing Clients: Minimal effort to make InfluxDB 3.0 clients (just use Arrow Flight clients)
| © Copyright 2024, InfluxData
36
How does InfluxDB 3.0 use Flight? Native Query Protocol
Querier
write path
read path
SELECT … FROM ...
1. User query via Flight
Ingester
2. Querier combines data from ingesters and parquet into response
3. Response via Flight
| © Copyright 2024, InfluxData
37
How does InfluxDB 3.0 use Flight?
Ingester ←→ Querier
Ingester
Querier
SELECT … FROM ...
write path
read path
1. User query
2. The querier fetches unpersisted (not yet written to parquet) data returned from the ingester, as RecordBatches via Flight
* we are working on in an optimized flight-like protocol to reduce latency
| © Copyright 2024, InfluxData
38
FlightSQL
Execute SQL queries and return results with a standard (non custom) client
| © Copyright 2024, InfluxData
39
Arrow FlightSQL: Benefits for InfluxDB
⇒ Access to SQL ecosystem, without implementing our own drivers/connectors
Prepackaged Client Libraries: JDBC/ODBC/Ecosystem drivers
Pre-packaged Integrations: Many systems already read from FlightSQL
Community Leverage: E.g. FlightSQL Grafana plugin, others contribute
| © Copyright 2024, InfluxData
40
Apache Arrow FlightSQL
| © Copyright 2024, InfluxData
41
How InfluxDB 3.0 uses FlightSQL
✅ Access to the ecosystem without having to implement JDBC, ODBC, … 😅
client
arrow-flight JDBC
Driver
InfluxDB 3.0
Data is sent via Arrow FlightSQL
(columnar)
arrow-flight JDBC driver implements JDBC API
JDBC Interface
| © Copyright 2024, InfluxData
42
Before Flight SQL
Server
Database specific protocol (e.g. Postgres FEBE)
Caveat: Only show JDBC and ODBC drivers. Also common is a native implementation in each language ecosystem - e.g like the Python DB API 20 and psycopg2 etc. See Apache Arrow Database Connectivity (ADBC) for more details
client
JDBC
Driver
JDBC Interface
client
JDBC
Driver
JDBC Interface
client
ODBC
Driver
ODBC Interface
| © Copyright 2024, InfluxData
43
Alternate Strategy: Use Posgres FEBE
(Slow + painful 🤮)
client
Server
Client: Prepare query:"SELECT * FROM foo WHERE a = $1;" name:""
Client: Bind name:"" parameters:[{format:"text", value:"42"}]
… (steps elided) …
Server: RowDescription fields:[{name:"a", type:"int", format:"text"}, …]
Server: DataRow fields:[{data:"42"}, {data: "Hunter Valley"} …]
Server: DataRow fields:[{data:"12"}, {data: "Merrimack Valley"} …]
… (lots ros for each data)
Server: DataRow fields:[{data:"321"}, {data: "Charles River Watershed"} …]
… (steps elided) …
JDBC Interface
3. Data is sent row by row
Postgres
JDBC
Driver
DataRow {..}
DataRow {..}
DataRow {..}
DataRow {..}
…
1. Data is produced column by column in RecordBatches
PG febe adapter
2. Adapter converts to the postgres FEBE protocol
It also turns out clients using the postgres driver tend to try and query the postgres metadata tables 😱
| © Copyright 2024, InfluxData
44
Conclusion
| © Copyright 2024, InfluxData
45
Conclusion
| © Copyright 2024, InfluxData
46
Related Work
The Composable Data Management System Manifesto (VLDB 2023)
Velox: Meta’s Unified Execution Engine
(VLDB 2022)
The Modern Data Architecture: The Deconstructed Database (USENIX ;login: Winter 2018)
A Deep Dive into Common Open Formats for Analytical DBMSs (VLDB 2023)
Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine
(To Appear SIGMOD 2024)
| © Copyright 2024, InfluxData
47
Try It Yourself
https://www.influxdata.com
https://github.com/InfluxCommunity
| © Copyright 2024, InfluxData
48
T H A N K Y O U
| © Copyright 2024, InfluxData
49
Backup Slides
| © Copyright 2024, InfluxData
50
Thank you!
Questions / Discussion
Find out more:
| © Copyright 2024, InfluxData
51
Defragmenting Data Access Across Systems
| © Copyright 2024, InfluxData
52
Integration: Arrow Language Implementations
53
Java
C++
JavaScript
Go
Rust
C
Ruby
Python
Julia
R
C#
Matlab
| © Copyright 2024, InfluxData
53
Why Arrow Internally (and not just at interface)?
Option 1: Use Arrow Internally (DataFusion, pola.rs, Acero)
Pros: Fast interchange, reuse Arrow libraries
Cons: Constrained(*) to Arrow
Option 2: Use specialized structures internally, convert to Arrow at edges (Velox, DuckDB)
Pros: Can use specialized structures
Cons: Maintain specialized code
Scan
Filter
Agg
Scan
Filter
?
Agg
?
?
Convert to arrow
| © Copyright 2024, InfluxData
54
Why Arrow Internally (and not just at interface)?
So far results are encouraging
Theory: Using Arrow is “good enough” compared to specialized structures
Pooled open source development → invest heavily in optimized Arrow kernels
Good: Sorting, Filtering, Projection, Parquet
Could Improve: Grouping, Joining
| © Copyright 2024, InfluxData
55