1 of 22

Nebula & Columns

Agenda:

  1. Nebula Intro
  2. Live Demo?
  3. Architecture & Design
  4. Discussions
  5. Columns Intro

2 of 22

Brief background

Shawn Cao - FB(2014~2019) hive, spark, presto -> file format, compute/IO efficiency

Nebula started as an optimal storage for fast computing.

-> Scuba

-> Cubric

Embracing industry standards:

(Engine)

  • Schema: hadoop/hive environment
  • Storage: HDFS, AWS/S3, GCP
  • Realtime: Kafka / Pubsub / Rockset
  • API: HTTP/JSON

Web UI/API

  • Adaptive visualization
  • Javascript IDE (embedded JS engine)

3 of 22

Live Demo - scenarios

HDFS/S3/Kafka: real-time & semi-realtime

  1. Aggregated by experiment with filter.
  2. Statistics: Histogram, cardinality, percentile.
  3. Metrics group.

Special visuals

  • Real time call stack analysis (FB strobelight).

Code pad/Advanced analysis:

  • Advanced analysis, Js UDF (server), pivot (client).

On-Demand:

  • Integration with Presto/Spark.

4 of 22

Birdeye - application

5 of 22

Birdeye - architecture

6 of 22

Quick Anatomy - Main Parts

7 of 22

Some screenshots

8 of 22

9 of 22

10 of 22

11 of 22

12 of 22

13 of 22

Dive a bit deeper

14 of 22

Nebula Node

Node

Store

  • Data Block
  • Columnar Store
  • Meta Data
  • Encoding / Compression

RLE, Delta, LZ4, Dictionary, Partition, Min/Max/Histogram/Bloom/Index

Node

Compute

  • Thread / Core
  • Task Based.
  • Task Priority
  • Async / Sync

Ingestion, Expire, Query, Poll, Health, Backup, Serde

Node

Scale*

  • Come & Go
  • Fault tolerance
  • Redundancy

15 of 22

Nebula Server

Server

Query

  • Query planning
  • Access control
  • Node projection
  • Final Merge
  • Data Serving

Server

Interface

  • protobuf/flatbuffer
  • grpc
  • dsl
  • api
  • clients

Server

Scale*

  • Workload balance
  • Keep tracking
  • Task manager

16 of 22

Nebula Web

Web

API

  • REST API
  • Cluster Info
  • Authentication
  • On-demand Load

Web

UI

  • Visualization
  • Client SDK
  • IDE
  • Query Composer

Web

Scale*

  • Independent Unit

17 of 22

Q & A

https://github.com/varchar-io/nebula

18 of 22

Compare

Scenarios

Presto/Spark

Flink

Druid

ODS

Nebula

Batch Processing

x

Realtime Processing

x

Realtime Query

x

x

x

x

Counter Value Time Series

x

Cube slice/dice

x

x

Full schema (hive)

x

x

SQL Interface

x

x

x

(optional)

With Storage

x

x

Need Separate Ingestion

x

x

UI

x

x

Instant UDF

x

Sub-Sec Perf

x

x

x

Live Debugger

x

x

Analytics Data Serve

x

Real Join

x

x

Map Join

x

x

x

x

Ad-hoc Data Analytics

x

x

DataHub Integration

x

x

x

Column-Level Access Control

(monarch)

x

Easy To Extend

x

x

19 of 22

Columns Ai

An end-user cloud service product - focus on real-time analytics, automation and storytelling.

20 of 22

Live Intelligence

Build for the whole real-time metrics life cycle including:

  • Explore streaming data and make sense of it
  • Anomaly detection and wire-up with your normal workspace.
  • Work together on alerts and issues.
  • Share story of any data exploration and issue diagnostic activities.

Basic example: https://columns.ai/story/729a5f2b-3852-428c-aecc-d0f1d419c075

21 of 22

Data Story & Data App

Transition from data exploration to data explanation - build embeddable, up-to-date data story as basic content element.

Build for personalized storytelling tools:

  • Rich editing
  • Annotation
  • Visual comparison, highlights
  • Animation
  • Voice over.
  • Etc.

https://columns.ai/discover

#nocode or #lowcode to build interactive data apps.

  1. Interactive - capture inputs
  2. Execute on fresh data
  3. Style to your style.

Example:

https://columns.ai/app/view/b61f8c85-b42f-40a5-bfa1-cfc41d5a5404

22 of 22

More …

https://columns.ai

Let’s make it together, reach out - cao@columns.ai