Reducing query latency in DataFusion via a caching object store layer
Artjoms Iškovs,�Principal Engineer �27th September 2024
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Before EDB: Splitgraph and Seafowl
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
At EDB: Postgres Lakehouse + HTAP
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Caching logic
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Benchmarking
22 TPC-H queries, SF10 (3GiB in Delta)
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Demo
TPC-H Q6
SELECT
sum(l_extendedprice * l_discount) AS revenue
FROM
lineitem
WHERE
l_shipdate >= CAST('1994-01-01' AS date)
AND l_shipdate < CAST('1995-01-01' AS date)
AND l_discount BETWEEN 0.05
AND 0.07
AND l_quantity < 24;
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Q&A
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Backup slides
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .
Future work
This version of this slide features an image at right and copy to the left. When using more than two such slides in a row, be sure to alternate these layouts.
© E D B 2 0 2 4 — A L L R I G H T S R E S E R V E D .