3 of 21

TIL: Variant

Variant: efficient semi-structured data (e.g. JSON) in Iceberg and Parquet

semi_column (variant)
{ "first": "Andrew", "last": "Lamb", },
“foo”
{ cities : [“Boston”, “NYC”] }
[1.0, “N/A”, 2.3, {“k”: “v”}]

id (integer)
1
2
3
4

Apache Parquet file with two top level columns: id, and semi_column

Structure varies from row to row

Arbitrary nesting

List elements can be different types

4 of 21

Benefits of Variant

Binary Format ⇒ Efficient storage
Shredding ⇒ Fast query
Standardization ⇒ OSS commoditization

Diagram source: Introducing the Open Variant Data Type in Delta Lake and Apache Spark | Databricks Blog

“Overall, the VARIANT data type streamlines working with semi-structured data in Databricks, enhancing flexibility, performance, and ease of use.

Snowflake has long offered the VARIANT data type, allowing you to store semi-structured data without pre-defining a schema. This eliminates the burden of schema design upfront.

In contrast, for Delta Lake previously we relied on the MAP data type, which requires a defined schema. However, semi-structured data often exhibits schema variations across rows, creating challenges for data engineers. Parsing the data correctly before storage was a necessary but tedious step.”

- Databricks - Variant Type Analysis - DEV Community

5 of 21

More about Variant

New Data Type: Parquet (almost finalized), Iceberg v3, and Spark 4.0

Semi-structured: Schema can vary row to row

Superset of JSON: JSON can be losslessly* round tripped through Variant

Efficient Binary Encoding:

Dictionary encoded field names (avoid repeating field names)
Efficiently encode offset (handle small and large numbers of fields / items)
“Shredding” structured columns (efficient query on structured columns)

⇒ Optimized for (common) case where most objects have the same structure, similarly to document databases

More Details:

* The logical content is captured losslessly, but not physical formatting (e.g. whitespace)

6 of 21

Kudos: Open Source Takes a Village 🛖

scovich - Ryan Johnson (DataBricks)
PinkCrow007 - Li Jiaying (CMU DB)
carpecodeum - Aditya Bhatnagar (CMU DB)
mkarbo - Malthe Karbo (Wayfare.ai)
superserious-dev - ?? (Unknown)
friendlymatthew - Matthew Kim (Pydantic)
micoo227: Michael Renda (Unknown)

Weijun-H - Alex Huang (OrbDB)
harshmotw-db - Harsh Motwani (DataBricks)
viirya - Liang-Chi Hsieh (DataBricks)
klion26 - Congxian Qiu (Unknown)
adriangb - Adrian Garcia Badaracco (Pydantic)
kosiew - Siew Kam Onn (Unknown)
codephage2020 - Yan Tingwang (unknown)
alamb - Andrew Lamb (InfluxData)

See [EPIC] [Parquet] Implement Variant type support in Parquet · Issue #6736 for more details

Contributors and affiliations to the Rust Variant implementation (ongoing)

7 of 21

How is Variant different than Struct and List?

Struct and List require a known schema (know all possible fields) up front

Can not support arbitrary nesting (max level is encoded in schema)

The same field in each Struct instance must have same type
All elements in each List instance must have same type

struct_col Struct { name: String }
{ "name": "Andrew", }
{ "name": "Jiaying" }
{ "name": 42 }

All struct fields are part of the type, must be known

list_col List(Float64)
[1.0, 2.0, 3.0]
[]
[4.5]
[“foo”]

All elements of a list must have the same type

8 of 21

Variant Encoding

Array

Lengths / offsets: variable length encoding (1 - 4 bytes)
Field names dictionary encoded in metadata
Metadata can be reused for multiple variant values

...

metadata

...

value

Variants stored as two separate binary arrays: metadata and value

...

first

...

last

...

[

{

"first": "Andrew",

"last": "Lamb",

{

"first": "Jiaying",

"last": "Li",

]

�

JSON encoding

Variant encoding

header

dictionary

array header

array length

object header

field name (pointer)

string header

string value

field name (pointer)

string header

string value

object header

field name (pointer)

string header

string value

field name (pointer)

string header

string value

...

Andrew

...

Lamb

...

Jiaying

...

Object 1

Object 2

9 of 21

Supported Variant Types

Variant Physical Type	Equivalent Parquet Type
null	UNKNOWN
boolean (True)	BOOLEAN
boolean (False)	BOOLEAN
int8	INT(8, signed)
int16	INT(16, signed)
int32	INT(32, signed)
int64	INT(64, signed)
double	DOUBLE
decimal4	DECIMAL(precision, scale)
decimal8	DECIMAL(precision, scale)
decimal16	DECIMAL(precision, scale)
date	DATE

Variant Physical Type	Equivalent Parquet Type
timestamp	TIMESTAMP(isAdjustedToUTC=true, MICROS)
timestamp without time zone	TIMESTAMP(isAdjustedToUTC=false, MICROS)
float	FLOAT
binary	BINARY
string	STRING
time without time zone	TIME(isAdjustedToUTC=false, MICROS)
timestamp with time zone	TIMESTAMP(isAdjustedToUTC=true, NANOS)
timestamp without time zone	TIMESTAMP(isAdjustedToUTC=false, NANOS)
uuid	UUID

10 of 21

Variant Storage in Parquet

Logical Type: VARIANT

Physical Type:

GROUP

{

metadata: binary,

value: binary

}

Non shredded Variant values are stored as a struct with two fields: 1) metadata, 2) value

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

{

"first": "Paul",

"last": "Dix",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

JSON encoding

Parquet Storage

(2 leaves)

Metadata

...

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

.metadata

.value

...

Parquet Schema

11 of 21

Variant Storage in Parquet: Dictionary + RLE

💡Common Case: repeated metadata (when objects have the same field names) can be efficiently stored using standard Parquet Dictionary / RLE encoding

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

{

"first": "Paul",

"last": "Dix",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

JSON encoding

Parquet Dictionary Encoding

Metadata

...

dictionary

Metadata

Indexes into dictionary

Repeated metadata only stored once

.metadata

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

.value

Two levels of dictionary encoding:

1. metadata itself is dictionary encoding of field names (opaque to Parquet dictionary encoding)

2. Parquet dictionary encoding reduces size even more if many rows use exactly the same metadata dictionary bytes (corresponding objects have exactly the same set of field names)

12 of 21

Variant representation in Arrow

StructArray + Canonical Extension Type

[Format] Add a Canonical Variant Extension Type

[Format] Consider adding an official variant type to Arrow

Metadata

len: ... buffer:0

offset: 0 prefix: ...

...

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

buffers[0]

Metadata

metadata Views

Note: Common metadata still only stored once

Similar efficient structure to Parquet physical layout, can reuse buffers. No conversion costs.

StructArray

{

metadata: BinaryView,

value: BinaryView

}

len: ... buffer:0

offset: 0 prefix: ...

len: ... buffer:0

offset: 0 prefix: ...

.metadata

.value

Schema

EXTENSION=”variant”

buffers[0]

values Views

...

len: ... buffer:0

offset: 0 prefix: ...

...

len: ... buffer:0

offset: 17 prefix: ...

len: ... buffer:0

offset: 56 prefix: ...

13 of 21

Variant Query: Filters (no Pushdown)

SELECT variant_column.last

FROM my_table

WHERE variant_column.org = ‘CMU’;

Scan

Table: my_table

Projection: variant_column

Filter

variant_get(variant_column, ‘org’) = ‘CMU’

DataFusion Plan

2. Filter is applied row by row: field is extracted and tested 🐢

Projection

Expr: variant_get(variant_column, ‘last’)

1. Plan reads entire variant value and passes it along 🐢

Note: New variant_get kernel for extracting variant fields

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

3. Extract “last” field🐢

14 of 21

Variant Query: Filters (Pushdown, no shredding)

SELECT variant_column.last

FROM my_table

WHERE variant_column.org = ‘CMU’;

Scan

Table: my_table

Filter: variant_get(variant_column, ‘org’) = ‘CMU’

Projection: variant_get(variant_column, ‘last’) as last_col,

Projection

Expr: last_col

DataFusion Plan

Filter and projection pushed into the scan, only needed data is copied

Still need to process each row 🐢

Notes:

Needs both filter pushdown on and (not yet available) projection pushdown

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

15 of 21

Enter: Variant “Shredding”

Variant: don’t know (all) columns up front🤔 or there are too many (100,000)

How to get columnar storage benefits:

Fast Predicate evaluation: evaluate columns (fields) accessed, skip data for other fields
Fast Projection: Only fetch (IO) and decode (CPU) columns you need
Good Compression: homogeneous types separated in columns

Variant “Shredding”

Solution ⇒

16 of 21

Variant Shredding in Parquet

GROUP

{

metadata: binary,

value: binary,

typed_value: GROUP {

last: string

org: string

}

[

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

{

"first": "Paul",

"last": "Dix",

"org": "InfluxData",

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

]

JSON encoding

Parquet Storage (4 leaves)

Metadata

...

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

.metadata

.value

See spec here: VariantShredding.md

Some (not all) fields are “shredded”

“Lamb”

“Dix”

“Li”

...

.typed_value

.last

“InfluxData”

“CMU”

...

.typed_value

.org

“InfluxData”

Parquet Schema

And stored as separate columns

17 of 21

Variant Query: Filters (Pushdown, with perfect shredding)

SELECT variant_column.last

FROM my_table

WHERE variant_column.org = ‘CMU’;

Scan

Table: my_table

Filter: v.org = ‘CMU’

Projection: v.last as last_col,

Projection

Expr: last_col

DataFusion Plan

(shredded file)

In this case, query never accesses metadata / value columns, only structured columns

Scan

Table: my_table

Filter: variant_get(variant_column, ‘org’) = ‘CMU’

Projection:

variant_get(variant_column, ‘last’) as last_col,

Projection

Expr: last_col

DataFusion Plan

(un shredded file)

Filter and projection refer to shredded columns 🐇

18 of 21

Thank You!

[EPIC] [Parquet] Implement Variant · Issue #6736 · apache/arrow-rs

https://github.com/apache/arrow-rs

https://datafusion.apache.org/

https://github.com/apache/datafusion

Communication (Slack/Discord/etc)

🎣

Come Join Us!

19 of 21

FAQ:

Question: How is Variant different than BSON?

More efficient (e.g. can share same field names across multiple objects in the same column)
Integrated into Parquet (including Shredding)
Designed to be a standard
Same name (different implementation) long popular with in SnowFlake

Question: How is Variant different than Struct datatype?

You do not need to know the schema (possible fields) up front

Question: How is Variant different than the Map datatype?

Parquet has no map type.
Arrow has a map type (docs), but all keys must be the same type and all values must be the same type (encoded as List(Struct<key: T1, value: T2))

20 of 21

Variant Query: Projection

SELECT variant_column.last || ‘,’ || variant_column.first

FROM my_table;

Scan

Table: my_table

Projection: variant_column

Projection

Expr: concat(

variant_get(variant_column, ‘last’),

‘,’,

variant_get(variant_column, ‘first’)

)

DataFusion Plan

Plan reads the entire variant value and passes it along 🐢

New variant_get kernel for extracting variant fields

21 of 21

Variant Query: Projection (Expression Pushdown)

SELECT variant_column.last || ‘,’ || variant_column.first

FROM my_table;

Scan

Table: my_table

Projection:

variant_get(variant_column, ‘last’) as last_col,

variant_get(variant_column, ‘first’) as first_col,

Projection

Expr: concat(last_col, ‘,’, first_col)

DataFusion Plan

Final expression references computed columns

Ticket: Support Push down expression evaluation in `TableProviders` · Issue #14993 · apache/datafusion

Notes:

Helps StructArray access more than variant due to deferred variant decoding. Also may help Vortex and other formats and other formats that have fast internal casting
Critical for Shredding pushdown (see later)

Expressions with variant_get are pushed into to the scan