Substrait
Engine & Language Independent Data Compute Instructions
Who?
OSS
Commercial
Warehouse, Lakehouse, soon we’ll see the Fairhouse
Fairhouse
Lakehouse
Query Engines
APIs
Tables & Catalogs
Cloud Warehouse
SQL API
Processing Engine
Storage (internal)
Spark API/SQL
Spark Engine
2015
2020
Delta Lake
Apache Hudi
Compute Engines
Spark API/SQL
Spark Engine
Snowflake SQL
Snowflake Engine
Redshift SQL
Dremio Engine
Velox Engine
Arrow Engine
Pandas
Ibis
Tables & Catalogs
Apache Iceberg
Delta Lake
Hudi
Tabular
Datalake Formation
2025
Snowflake, Redshift
Dremio SQL
Dremio Engine
Snowflake SQL
Snowflake Engine
Apache Iceberg
Databricks, Dremio, Starburst
Warehouse Appliance
SQL API
Processing Engine
Storage (internal)
Teradata, Netezza
2000
Best-of-breed Decomposition Requires Components
API
Engine
API
Engine
Compiler
Storage
Storage
actually more like this…
Computation
Data
Data
Instruction
Instruction
Instruction
Substrait
How to collaborate on these layers?
API
Engine
Compiler
Storage
Computation
Data
Data
Instruction
Instruction
Instruction
The Power of Intermediate Representations
| JVM Bytecode | LLVM IR | Substrait |
Abstract Level | Low | Low | High |
FE Innovations | Scala, Clojure, Kotlin | Rust, Swift | New Languages, abstractions |
BE Innovations | Dalvik, Graal | WASM | Computational storage, hardware accelerators, specialized engines |
Substrait: Cross-Language Serialization for Relational Algebra
Status
Purpose
Why
Theoretical Integrations
substrait
C++ Kernel
ibis
python
r
babel sql
views
duck sql
scala
Principles
Primitives
Plan
Relations
Types
i8
i32
fp32
struct
Expressions
Functions
add
avg
rank
sum
join
agg
filter
read
case
field
literal
orlist
Serialization
protobuf
text
Components
Extensibility with Discipline
No Formal Concept of "logical" vs "physical"
Project/User Definitions are inconsistent.
There are just things that are closer to and further from implementation
More General
More specific
Gluten
Project Governance
Theoretical Integrations
substrait
C++ Kernel
ibis
python
r
babel sql
views
duck sql
scala
Actual Implementations
substrait
C++ Kernel
ibis
python
r
babel sql
views
duck sql
scala
1
2
3
8
4
8
5
4
6
7
8: https://bit.ly/3P2egYE
Join the Community