2 of 32

Opening remarks

Apache Pinot Incubation

2022

2021

2020

2019

2018

800 Members

150 Contributors

500k Lines changed

100 Members

3500 Members

270 Contributors

10k Commits

100 Deployments

Presenter: Mayank Shrivastava

2000 Members

3 of 32

Kicking off the roadmap show-and-tell

Poll.ly
60+ participants
35 suggestions
High level themes:

Multi-stage engine
SQL compatibility
Pluggability
Ingestion enhancements
Performance enhancements
Operational Improvements

Presenter: Neha Pawar

4 of 32

Multi-stage Engine

5 of 32

V2 Engine Improvements

Framework Enhancement

Pluggable planner extensions
Unified runtime operators
and more …

Query Planner Improvement

Support query hints
Enhance plan optimizer with advance rules/rule-groups
Pipeline Scheduling

Partition Multi-threading

Execute query stages in multi-thread fashion
Utilize table-level partition to push-down operators

Presenter: Rong Rong

6 of 32

Co-located Joins

Colocated What?

Auto-detect when shuffles can be skipped.
Run large complex plans without shuffles. (>30 stages)

Colocated Why?

Fast: Skip Serde, Fan Out, etc.
Low Overhead ⇒ Do More with Less.
Not Just Joins: e.g. pair with with dynamic filtering or broker segment pruning.

Colocated When?

Q1’23: Partial Rollout: 1 Use-Case
April: Complete Rollout
Q2’23: Private Beta

~30 TB

Data

2M+ MPS

30+ Stages

P90

< 5s

Presenter: Ankit Sultana

7 of 32

Local Joins

What?

Push co-located join down to the project layer
Select/Aggregate/Group-by on the joined results

SELECT/AGGREGATE/GROUP-BY

TRANSFORM/LOCAL-JOIN

PROJECTION

FILTER

Why?

Push down filters
Avoid projecting all documents
Utilize the V1 operators for fast aggregation/group-by
Achieve segment level parallelism

SELECT …

FROM T1 JOIN T2 ON T1.key = T2.key

WHERE T1.a = 123 AND T1.b > T2.c

When?

Q1’23: POC
Q2’23: Production

Presenter: Jackie Jiang

8 of 32

SQL Compliance

9 of 32

Postgres SQL compliance

Sql type compliance

Null support

Design Doc (WIP)
No more NPE
Smooth null representation
ETA: 2023 Q2

Type safe function invocation

Data precision
Flexible function invocation
ETA: 2023 Q3

Better usability with sql standard

Presenter: Yao Liu

10 of 32

Window Functions

Advanced SQL functionality
Leverages the multi-stage query engine
OSS Issue: #7213
Design Document (Phase 1)
Implement Window Functions in multiple phases:

Phase 1: Basic aggregate window functions[IN PROGRESS]ETA: 2023 Q2
Phase 2: Support for non-aggregate window functions such as rank based functions
Phase 3: Support for sliding window aggregations (custom frames)
Phase 4: Support for multiple window groups (multiple OVER() clauses with different PARTITION BY, ORDER BY and frame clauses)

Presenter: Sonam Mandal

11 of 32

Pagination

Paginate the response over multiple smaller batches as opposed to fetching everything at one go in a single call (current state) and having to absorb the entire response in memory in their code.
The proposed design is already out for review.
Implementation in progress. More PRs to come in the following weeks.
ETA: early Q3 of 2023
Potential future work: async query execution for pagination queries, cursor syntax support, etc.

Presenter: Jialiang Li

12 of 32

Pluggability

13 of 32

Index SPI

Increase velocity:

Simplify the creation of new index types.

Even for external contributors!

New configuration mechanism

Optional (don’t break backward compatibility).
Same for all single column indexes.
More expressive (any JSON value).

Only affects single column indexes.

Does not affect Startree index.

GitHub Issue #10183
PEP document
Target: Next Pinot release

Presenter: Gonzalo Ortiz

15 of 32

Spark Connector

We already have Spark Connector for reads
Support for Spark3 coming (days just landed!)
Support for Pinot Query Options coming (days)

maxExecutionThreads, enableNullHandling, skipUpsert etc.

Logging and monitoring improvements (weeks)
Support for Aggregation Pushdown coming (1-2 months)

count, min, max, avg, …

Experimenting with write support using Spark3 BatchWrite Interface (can be done later this year)

Presenter: Caner Balci

16 of 32

Pauseless consumption

Motivation

Pinot does not consume events from realtime stream while completing segments
Sometimes segment completion could take a long time
There are some use cases that are sensitive to such consumption pauses

Approach

Start consumption from the same partition in a new “child” segment
Include the child segment in all queries that need the “parent” segment.
After completion, take a short pause to copy data to the new segment.

Status/Challenges

Sizing the resources to allocate for the “child” segment
Handling corner cases like partition moves, failure conditions during completion, etc.
Verifying whether this solves/helps latency sensitive use cases

Presenter: Sajjad Moradi

17 of 32

Record Deletion in upserts

Problem: Explicitly delete a record from an upsert table (#10452)

Use Case Examples:

CDC data (eg. Debezium format) requires handling DELETE events
DELETE FROM support

Approach ((Design Doc)

Use explicit field in input as delete marker
Soft delete of record in the segment by maintaining metadata, alongside upsert metadata

Presenter: Navina Ramesh

18 of 32

TTL for Upserts

Context

Heap data structures for upsert
TTL for reducing memory footprint
ValidDocIds Snapshot for metadata recovery
Use cases.
Link to Github and design docs
PoC Q1 2023, Productionize in Q2 2023

Presenter: Qiaochu Liu

19 of 32

Performance Enhancements

20 of 32

Group by improvements

High latency and low resource utilization for large scale group by

Reduce contention
Improve parallelism
Better usage of memory/CPU
ETA: 2023 Q3

Presenter: Yao Liu

21 of 32

Offheap distinct(count)

Current distinct(count) functions create in-memory sets.

Increased chances of OOM
Incurs gc pressure
Cannot handle high cardinality
Hard to utilize disk for spilling

Off-heap (direct buffer) hash table based solution can help here.
Can be extended by supporting spilling over to disk
Off heap hash-table can potentially be extended to group-by queries.

Presenter: Jia Guo

22 of 32

Partitioned distinct(count)

%3 = 0

3, 3, 12, 15, 18, 15, 21

%3 = 1

%3 = 2

5 + 4 + 3 = 12

1, 4, 4, 7, 7, 7, 13

5, 5, 5, 8, 11, 11

Send disjoint count/sets instead of overlapping ones

Reduce the set ser/de, transmission, and merge time/memory footprint
This logic can be applied at different levels

Presenter: Jia Guo

23 of 32

Arg_Min/Max

Show me the avg salary and the people with highest salary, group by their positions

Position	Avg(salary)	Max(Salary)	ArgMax(Salary,FirstName)	ArgMax(Salary, ID)
Mechanical Engineer	7000	18000	Henry	12345
			Linda	12310
Professor	7200	16000	Juliette	13451

OSS Issue: #6707
Will be implemented for V1 engine first and extensible for V2
Aggregation function implementation is more efficient than cte join query
ETA 2023 Q2

Presenter: Jia Guo

24 of 32

Continuing Resiliency Work (Workload Management, Scheduling)

Problem Statement

Servers process queries in a FIFO fashion
Queries can hog resources unfairly leading to starvation

Mitigation

Smart scheduling of queries based on priority, query cost, available resources
Query pre-emption (yield/pause) for long-running queries to avoid starvation
CPU/Memory limits for queries that can adaptively grow depending on available resources.
Control parallelization when server is loaded
QoS queuing and early rejection of queries.

Presenter: Vivek Iyer

25 of 32

CLP and log compression

Log search @Uber Scale = Compression + Fast Search

CLP (by YScope) used for Spark Log compression

169X compression over text
Impossible → easy

Design doc

the PR for data encoder
New UDFs and indexing requests
Text search over compressed data

PoC built and feature complete in H2 2023

Presenter: Ting Chen

26 of 32

Operational Improvements

27 of 32

Groovy function registry

Pinot supports running Groovy scripts for UDFs; however, it’s disabled by default for security concerns.
We would like to improve this by allowing only authorized users to register groovy functions and other users to have the access to use those registered groovy functions.
Issue: #9365

Function Registry

function x() -> Groovy(...)�function y() -> Groovy(...)�function z() -> Groovy(...)�…

Authorized users register functions�(API endpoint should be authorized based on ACL)

"ingestionConfig": {

"filterConfig": {

"filterFunction": "x()"

},� "transformConfigs": [{

"columnName": "colA",

"transformFunction": “y()"

},� ...

}

select z() from T...

Use those registered functions for �ingestion config

Use those registered functions for queries

Presenter: Haitao Zhang

28 of 32

Pauseless consumption

Motivation

Pinot does not consume events from realtime stream while completing segments
Sometimes segment completion could take a long time
There are some use cases that are sensitive to such consumption pauses

Approach

Start consumption from the same partition in a new “child” segment
Include the child segment in all queries that need the “parent” segment.
After completion, take a short pause to copy data to the new segment.

Status/Challenges

Sizing the resources to allocate for the “child” segment
Handling corner cases like partition moves, failure conditions during completion, etc.
Verifying whether this solves/helps latency sensitive use cases

Presenter: Sajjad Moradi

29 of 32

V2 Engine Improvements

Framework Enhancement

Pluggable planner extensions
Unified runtime operators
and more …

Query Planner Improvement

Support query hints
Enhance plan optimizer with advance rules/rule-groups
Pipeline Scheduling

Partition Multi-threading

Execute query stages in multi-thread fashion
Utilize table-level partition to push-down operators

Presenter: Rong Rong

30 of 32

Some other interesting features in the poll

System metadata tables
Java 17 Upgrade
Server-level ingestion throttling
Compaction for upserts
Tenant-level rebalancing of pinot components

Presenter: Neha Pawar

31 of 32

Content

A Journey into Apache Pinot and RTA
RTA for IoT
Pause/Resume ingestion
Configurable Time Boundary
Multi Volume support
Consumer Record Lag API
Configuring segment threshold
Geospatial objects
Geospatial indexing
Force Commit
JSON Indexes
Backfill on Real-Time Tables
StarTree Index��

Advocacy Updates

Conferences

Atlanta DevCon
Global AI
DevNexus
DevSum
Big Data Technology Warsaw��

Presenter: Mark Needham

1 of 32

2 of 32

3 of 32

4 of 32

5 of 32

6 of 32

7 of 32

8 of 32

9 of 32

10 of 32

11 of 32

12 of 32

13 of 32

14 of 32

15 of 32

16 of 32

17 of 32

18 of 32

19 of 32

20 of 32

21 of 32

22 of 32

23 of 32

24 of 32

25 of 32

26 of 32

27 of 32

28 of 32

29 of 32

30 of 32

31 of 32

32 of 32