1 of 22

Nebula & Columns

Agenda:

Nebula Intro
Live Demo?
Architecture & Design
Discussions
Columns Intro

2 of 22

Brief background

Shawn Cao - FB(2014~2019) hive, spark, presto -> file format, compute/IO efficiency

Nebula started as an optimal storage for fast computing.

-> Scuba

-> Cubric

Embracing industry standards:

(Engine)

Schema: hadoop/hive environment
Storage: HDFS, AWS/S3, GCP
Realtime: Kafka / Pubsub / Rockset
API: HTTP/JSON

Web UI/API

Adaptive visualization
Javascript IDE (embedded JS engine)

3 of 22

Live Demo - scenarios

HDFS/S3/Kafka: real-time & semi-realtime

Aggregated by experiment with filter.
Statistics: Histogram, cardinality, percentile.
Metrics group.

Special visuals

Real time call stack analysis (FB strobelight).

Code pad/Advanced analysis:

Advanced analysis, Js UDF (server), pivot (client).

On-Demand:

Integration with Presto/Spark.

4 of 22

Birdeye - application

5 of 22

Birdeye - architecture

6 of 22

Quick Anatomy - Main Parts

7 of 22

Some screenshots

8 of 22

9 of 22

10 of 22

11 of 22

12 of 22

13 of 22

Dive a bit deeper

14 of 22

Nebula Node

Node

Store

Data Block
Columnar Store
Meta Data
Encoding / Compression

RLE, Delta, LZ4, Dictionary, Partition, Min/Max/Histogram/Bloom/Index

Node

Compute

Thread / Core
Task Based.
Task Priority
Async / Sync

Ingestion, Expire, Query, Poll, Health, Backup, Serde

Node

Scale*

Come & Go
Fault tolerance
Redundancy

15 of 22

Nebula Server

Server

Query

Query planning
Access control
Node projection
Final Merge
Data Serving

Server

Interface

protobuf/flatbuffer
grpc
dsl
api
clients

Server

Scale*

Workload balance
Keep tracking
Task manager

16 of 22

Nebula Web

Web

API

REST API
Cluster Info
Authentication
On-demand Load

Web

UI

Visualization
Client SDK
IDE
Query Composer

Web

Scale*

Independent Unit

17 of 22

Q & A

https://github.com/varchar-io/nebula

18 of 22

Compare

Scenarios	Presto/Spark	Flink	Druid	ODS	Nebula
Batch Processing	x
Realtime Processing		x
Realtime Query		x	x	x	x
Counter Value Time Series			x
Cube slice/dice			x		x
Full schema (hive)	x				x
SQL Interface	x	x	x		(optional)
With Storage			x	x
Need Separate Ingestion			x	x
UI				x	x
Instant UDF					x
Sub-Sec Perf			x	x	x
Live Debugger				x	x
Analytics Data Serve			x
Real Join	x	x
Map Join	x	x	x		x
Ad-hoc Data Analytics	x				x
DataHub Integration	x		x		x
Column-Level Access Control	(monarch)				x
Easy To Extend				x	x

19 of 22

Columns Ai

An end-user cloud service product - focus on real-time analytics, automation and storytelling.

20 of 22

Live Intelligence

Build for the whole real-time metrics life cycle including:

Explore streaming data and make sense of it
Anomaly detection and wire-up with your normal workspace.
Work together on alerts and issues.
Share story of any data exploration and issue diagnostic activities.

Basic example: https://columns.ai/story/729a5f2b-3852-428c-aecc-d0f1d419c075

21 of 22

Data Story & Data App

Transition from data exploration to data explanation - build embeddable, up-to-date data story as basic content element.

Build for personalized storytelling tools:

Rich editing
Annotation
Visual comparison, highlights
Animation
Voice over.
Etc.

https://columns.ai/discover

#nocode or #lowcode to build interactive data apps.

Interactive - capture inputs
Execute on fresh data
Style to your style.

Example:

https://columns.ai/app/view/b61f8c85-b42f-40a5-bfa1-cfc41d5a5404

22 of 22

More …

https://columns.ai

Let’s make it together, reach out - cao@columns.ai