1 of 32

Apache Pinot Roadmap 2023

Brought to you by the Apache Pinot Community

2 of 32

Opening remarks

Apache Pinot Incubation

2022

2021

2020

2019

2018

800 Members

150 Contributors

500k Lines changed

100 Members

3500 Members

270 Contributors

10k Commits

100 Deployments

Presenter: Mayank Shrivastava

2000 Members

3 of 32

Kicking off the roadmap show-and-tell

  • Poll.ly
  • 60+ participants
  • 35 suggestions
  • High level themes:
    • Multi-stage engine
    • SQL compatibility
    • Pluggability
    • Ingestion enhancements
    • Performance enhancements
    • Operational Improvements

Presenter: Neha Pawar

4 of 32

Multi-stage Engine

5 of 32

V2 Engine Improvements

Framework Enhancement

  • Pluggable planner extensions
  • Unified runtime operators
  • and more …

Query Planner Improvement

  • Support query hints
  • Enhance plan optimizer with advance rules/rule-groups
  • Pipeline Scheduling

Partition Multi-threading

  • Execute query stages in multi-thread fashion
  • Utilize table-level partition to push-down operators

Presenter: Rong Rong

6 of 32

Co-located Joins

Colocated What?

  • Auto-detect when shuffles can be skipped.
  • Run large complex plans without shuffles. (>30 stages)

Colocated Why?

  • Fast: Skip Serde, Fan Out, etc.
  • Low Overhead ⇒ Do More with Less.
  • Not Just Joins: e.g. pair with with dynamic filtering or broker segment pruning.

Colocated When?

  • Q1’23: Partial Rollout: 1 Use-Case
  • April: Complete Rollout
  • Q2’23: Private Beta

~30 TB

Data

2M+ MPS

30+ Stages

P90

< 5s

Presenter: Ankit Sultana

7 of 32

Local Joins

What?

  • Push co-located join down to the project layer
  • Select/Aggregate/Group-by on the joined results

SELECT/AGGREGATE/GROUP-BY

TRANSFORM/LOCAL-JOIN

PROJECTION

FILTER

Why?

  • Push down filters
  • Avoid projecting all documents
  • Utilize the V1 operators for fast aggregation/group-by
  • Achieve segment level parallelism

SELECT …

FROM T1 JOIN T2 ON T1.key = T2.key

WHERE T1.a = 123 AND T1.b > T2.c

When?

  • Q1’23: POC
  • Q2’23: Production

Presenter: Jackie Jiang

8 of 32

SQL Compliance

9 of 32

Postgres SQL compliance

Sql type compliance

  • Null support
    • Design Doc (WIP)
    • No more NPE
    • Smooth null representation
    • ETA: 2023 Q2
  • Type safe function invocation
    • Data precision
    • Flexible function invocation
    • ETA: 2023 Q3

Better usability with sql standard

Presenter: Yao Liu

10 of 32

Window Functions

  • Advanced SQL functionality
  • Leverages the multi-stage query engine
  • OSS Issue: #7213
  • Design Document (Phase 1)
  • Implement Window Functions in multiple phases:
    • Phase 1: Basic aggregate window functions[IN PROGRESS]ETA: 2023 Q2
    • Phase 2: Support for non-aggregate window functions such as rank based functions
    • Phase 3: Support for sliding window aggregations (custom frames)
    • Phase 4: Support for multiple window groups (multiple OVER() clauses with different PARTITION BY, ORDER BY and frame clauses)

Presenter: Sonam Mandal

11 of 32

Pagination

  • Paginate the response over multiple smaller batches as opposed to fetching everything at one go in a single call (current state) and having to absorb the entire response in memory in their code.
  • The proposed design is already out for review.
  • Implementation in progress. More PRs to come in the following weeks.
  • ETA: early Q3 of 2023
  • Potential future work: async query execution for pagination queries, cursor syntax support, etc.

Presenter: Jialiang Li

12 of 32

Pluggability

13 of 32

Index SPI

  • Increase velocity:
    • Simplify the creation of new index types.
      • Even for external contributors!
  • New configuration mechanism
    • Optional (don’t break backward compatibility).
    • Same for all single column indexes.
    • More expressive (any JSON value).
  • Only affects single column indexes.
    • Does not affect Startree index.

  • GitHub Issue #10183
  • PEP document
  • Target: Next Pinot release

Presenter: Gonzalo Ortiz

14 of 32

Ingestion

15 of 32

Spark Connector

  • We already have Spark Connector for reads
  • Support for Spark3 coming (days just landed!)
  • Support for Pinot Query Options coming (days)
    • maxExecutionThreads, enableNullHandling, skipUpsert etc.
  • Logging and monitoring improvements (weeks)
  • Support for Aggregation Pushdown coming (1-2 months)
    • count, min, max, avg, …
  • Experimenting with write support using Spark3 BatchWrite Interface (can be done later this year)

Presenter: Caner Balci

16 of 32

Pauseless consumption

Motivation

  • Pinot does not consume events from realtime stream while completing segments
  • Sometimes segment completion could take a long time
  • There are some use cases that are sensitive to such consumption pauses

Approach

  • Start consumption from the same partition in a new “child” segment
  • Include the child segment in all queries that need the “parent” segment.
  • After completion, take a short pause to copy data to the new segment.

Status/Challenges

  • Sizing the resources to allocate for the “child” segment
  • Handling corner cases like partition moves, failure conditions during completion, etc.
  • Verifying whether this solves/helps latency sensitive use cases

Presenter: Sajjad Moradi

17 of 32

Record Deletion in upserts

Problem: Explicitly delete a record from an upsert table (#10452)

Use Case Examples:

  • CDC data (eg. Debezium format) requires handling DELETE events
  • DELETE FROM support

Approach ((Design Doc)

  • Use explicit field in input as delete marker
  • Soft delete of record in the segment by maintaining metadata, alongside upsert metadata

Presenter: Navina Ramesh

18 of 32

TTL for Upserts

Context

  • Heap data structures for upsert
  • TTL for reducing memory footprint
  • ValidDocIds Snapshot for metadata recovery
  • Use cases.
  • Link to Github and design docs
  • PoC Q1 2023, Productionize in Q2 2023

Presenter: Qiaochu Liu

19 of 32

Performance Enhancements

20 of 32

Group by improvements

High latency and low resource utilization for large scale group by

  • Reduce contention
  • Improve parallelism
  • Better usage of memory/CPU
  • ETA: 2023 Q3

Presenter: Yao Liu

21 of 32

Offheap distinct(count)

  • Current distinct(count) functions create in-memory sets.
    • Increased chances of OOM
    • Incurs gc pressure
    • Cannot handle high cardinality
    • Hard to utilize disk for spilling
  • Off-heap (direct buffer) hash table based solution can help here.
  • Can be extended by supporting spilling over to disk
  • Off heap hash-table can potentially be extended to group-by queries.

Presenter: Jia Guo

22 of 32

Partitioned distinct(count)

%3 = 0

3, 3, 12, 15, 18, 15, 21

%3 = 1

%3 = 2

5 + 4 + 3 = 12

1, 4, 4, 7, 7, 7, 13

5, 5, 5, 8, 11, 11

  • Send disjoint count/sets instead of overlapping ones
  • Reduce the set ser/de, transmission, and merge time/memory footprint
  • This logic can be applied at different levels

Presenter: Jia Guo

23 of 32

Arg_Min/Max

  • Show me the avg salary and the people with highest salary, group by their positions

Position

Avg(salary)

Max(Salary)

ArgMax(Salary,FirstName)

ArgMax(Salary, ID)

Mechanical Engineer

7000

18000

Henry

12345

Linda

12310

Professor

7200

16000

Juliette

13451

  • OSS Issue: #6707
  • Will be implemented for V1 engine first and extensible for V2
  • Aggregation function implementation is more efficient than cte join query
  • ETA 2023 Q2

Presenter: Jia Guo

24 of 32

Continuing Resiliency Work (Workload Management, Scheduling)

Problem Statement

  • Servers process queries in a FIFO fashion
  • Queries can hog resources unfairly leading to starvation

Mitigation

  • Smart scheduling of queries based on priority, query cost, available resources
  • Query pre-emption (yield/pause) for long-running queries to avoid starvation
  • CPU/Memory limits for queries that can adaptively grow depending on available resources.
  • Control parallelization when server is loaded
  • QoS queuing and early rejection of queries.

Presenter: Vivek Iyer

25 of 32

CLP and log compression

  • Log search @Uber Scale = Compression + Fast Search

  • CLP (by YScope) used for Spark Log compression
    • 169X compression over text
    • Impossible → easy

  • Design doc
    • the PR for data encoder
    • New UDFs and indexing requests
    • Text search over compressed data

  • PoC built and feature complete in H2 2023

Presenter: Ting Chen

26 of 32

Operational Improvements

27 of 32

Groovy function registry

  • Pinot supports running Groovy scripts for UDFs; however, it’s disabled by default for security concerns.
  • We would like to improve this by allowing only authorized users to register groovy functions and other users to have the access to use those registered groovy functions.
  • Issue: #9365

Function Registry

function x() -> Groovy(...)�function y() -> Groovy(...)�function z() -> Groovy(...)�…

Authorized users register functions�(API endpoint should be authorized based on ACL)

"ingestionConfig": {

"filterConfig": {

"filterFunction": "x()"

},� "transformConfigs": [{

"columnName": "colA",

"transformFunction": “y()"

},� ...

}

select z() from T...

Use those registered functions for �ingestion config

Use those registered functions for queries

Presenter: Haitao Zhang

28 of 32

Pauseless consumption

Motivation

  • Pinot does not consume events from realtime stream while completing segments
  • Sometimes segment completion could take a long time
  • There are some use cases that are sensitive to such consumption pauses

Approach

  • Start consumption from the same partition in a new “child” segment
  • Include the child segment in all queries that need the “parent” segment.
  • After completion, take a short pause to copy data to the new segment.

Status/Challenges

  • Sizing the resources to allocate for the “child” segment
  • Handling corner cases like partition moves, failure conditions during completion, etc.
  • Verifying whether this solves/helps latency sensitive use cases

Presenter: Sajjad Moradi

29 of 32

V2 Engine Improvements

Framework Enhancement

  • Pluggable planner extensions
  • Unified runtime operators
  • and more …

Query Planner Improvement

  • Support query hints
  • Enhance plan optimizer with advance rules/rule-groups
  • Pipeline Scheduling

Partition Multi-threading

  • Execute query stages in multi-thread fashion
  • Utilize table-level partition to push-down operators

Presenter: Rong Rong

30 of 32

Some other interesting features in the poll

  • System metadata tables
  • Java 17 Upgrade
  • Server-level ingestion throttling
  • Compaction for upserts
  • Tenant-level rebalancing of pinot components

Presenter: Neha Pawar

31 of 32

Content

  • A Journey into Apache Pinot and RTA
  • RTA for IoT
  • Pause/Resume ingestion
  • Configurable Time Boundary
  • Multi Volume support
  • Consumer Record Lag API
  • Configuring segment threshold
  • Geospatial objects
  • Geospatial indexing
  • Force Commit
  • JSON Indexes
  • Backfill on Real-Time Tables
  • StarTree Index��

Advocacy Updates

Conferences

  • Atlanta DevCon
  • Global AI
  • DevNexus
  • DevSum
  • Big Data Technology Warsaw��

Presenter: Mark Needham

32 of 32

Q&A Closing remarks

Guidelines

  • Summarize
  • Take a picture
  • Open up for Q&A