Lance: A New Columnar Data Format
Motivation
Introducing Lance, a new OSS columnar format
Data layout
Logical layout
Physical layout
Sample comparison with Parquet
Performance benchmarks
Experiments Setup
Point query
Batch scan
Large-than-Memory Analytics via DuckDB
import duckdb
import lance
uri = "s3://bucket/path/to/oxford_pet.lance"
pets = lance.dataset(uri)
duckdb.query(
"SELECT label, count(1) FROM pets GROUP BY label"
).to_arrow_table()
Roadmap
Appendix
Set the baseline