1 of 21

Introduction to Iceberg / Parquet Variant

x

July 10 2025: Apache Iceberg Meetup, NYC

Andrew Lamb

Staff Engineer, InfluxData

2 of 21

Andrew Lamb

Enterprise Software: 22 years

Databases: Oracle + Vertica (Optimizer) + InfluxData + Apache DataFusion

ML / Engineering Leadership: Nutonian + DataRobot

Compilers: DataPower/IBM

  • Apache DataFusion PMC (Chair)
  • Apache Arrow PMC
  • Apache Software Foundation Member

Staff Engineer

3 of 21

TIL: Variant

Variant: efficient semi-structured data (e.g. JSON) in Iceberg and Parquet

semi_column (variant)

{

"first": "Andrew",

"last": "Lamb",

},

“foo”

{ cities : [“Boston”, “NYC”] }

[1.0, “N/A”, 2.3, {“k”: “v”}]

id (integer)

1

2

3

4

Apache Parquet file with two top level columns: id, and semi_column

Structure varies from row to row

Arbitrary nesting

List elements can be different types

4 of 21

Benefits of Variant

  • Binary Format ⇒ Efficient storage
  • Shredding ⇒ Fast query
  • Standardization ⇒ OSS commoditization

“Overall, the VARIANT data type streamlines working with semi-structured data in Databricks, enhancing flexibility, performance, and ease of use.

Snowflake has long offered the VARIANT data type, allowing you to store semi-structured data without pre-defining a schema. This eliminates the burden of schema design upfront.

In contrast, for Delta Lake previously we relied on the MAP data type, which requires a defined schema. However, semi-structured data often exhibits schema variations across rows, creating challenges for data engineers. Parsing the data correctly before storage was a necessary but tedious step.”

5 of 21

More about Variant

New Data Type: Parquet (almost finalized), Iceberg v3, and Spark 4.0

Semi-structured: Schema can vary row to row

Superset of JSON: JSON can be losslessly* round tripped through Variant

Efficient Binary Encoding:

  • Dictionary encoded field names (avoid repeating field names)
  • Efficiently encode offset (handle small and large numbers of fields / items)
  • “Shredding” structured columns (efficient query on structured columns)

⇒ Optimized for (common) case where most objects have the same structure, similarly to document databases

More Details:

* The logical content is captured losslessly, but not physical formatting (e.g. whitespace)

6 of 21

Kudos: Open Source Takes a Village 🛖

Contributors and affiliations to the Rust Variant implementation (ongoing)

7 of 21

How is Variant different than Struct and List?

  1. Struct and List require a known schema (know all possible fields) up front
    1. Can not support arbitrary nesting (max level is encoded in schema)
  2. The same field in each Struct instance must have same type
  3. All elements in each List instance must have same type

struct_col

Struct {

name: String

}

{ "name": "Andrew", }

{ "name": "Jiaying" }

{ "name": 42 }

All struct fields are part of the type, must be known

list_col

List(Float64)

[1.0, 2.0, 3.0]

[]

[4.5]

[“foo”]

All elements of a list must have the same type

8 of 21

Variant Encoding

Array

  • Lengths / offsets: variable length encoding (1 - 4 bytes)
  • Field names dictionary encoded in metadata
  • Metadata can be reused for multiple variant values

...

metadata

...

value

Variants stored as two separate binary arrays: metadata and value

...

first

...

last

...

[

{

"first": "Andrew",

"last": "Lamb",

},

{

"first": "Jiaying",

"last": "Li",

},

]

JSON encoding

Variant encoding

header

dictionary

array header

array length

object header

field name (pointer)

string header

string value

field name (pointer)

string header

string value

object header

field name (pointer)

string header

string value

field name (pointer)

string header

string value

...

11

2

Andrew

...

23

Lamb

...

...

11

Jiaying

...

23

Li

...

Object 1

Object 2

9 of 21

Supported Variant Types

Variant Physical Type

Equivalent Parquet Type

null

UNKNOWN

boolean (True)

BOOLEAN

boolean (False)

BOOLEAN

int8

INT(8, signed)

int16

INT(16, signed)

int32

INT(32, signed)

int64

INT(64, signed)

double

DOUBLE

decimal4

DECIMAL(precision, scale)

decimal8

DECIMAL(precision, scale)

decimal16

DECIMAL(precision, scale)

date

DATE

Variant Physical Type

Equivalent Parquet Type

timestamp

TIMESTAMP(isAdjustedToUTC=true, MICROS)

timestamp without time zone

TIMESTAMP(isAdjustedToUTC=false, MICROS)

float

FLOAT

binary

BINARY

string

STRING

time without time zone

TIME(isAdjustedToUTC=false, MICROS)

timestamp with time zone

TIMESTAMP(isAdjustedToUTC=true, NANOS)

timestamp without time zone

TIMESTAMP(isAdjustedToUTC=false, NANOS)

uuid

UUID

10 of 21

Variant Storage in Parquet

Logical Type: VARIANT

Physical Type:

GROUP

{

metadata: binary,

value: binary

}

Non shredded Variant values are stored as a struct with two fields: 1) metadata, 2) value

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

{

"first": "Paul",

"last": "Dix",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

JSON encoding

Parquet Storage

(2 leaves)

Metadata

Metadata

Metadata

...

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

.metadata

.value

...

Parquet Schema

11 of 21

Variant Storage in Parquet: Dictionary + RLE

💡Common Case: repeated metadata (when objects have the same field names) can be efficiently stored using standard Parquet Dictionary / RLE encoding

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

{

"first": "Paul",

"last": "Dix",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

JSON encoding

Parquet Dictionary Encoding

Metadata

0

...

dictionary

0

0

Metadata

Indexes into dictionary

Repeated metadata only stored once

.metadata

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

.value

Two levels of dictionary encoding:

1. metadata itself is dictionary encoding of field names (opaque to Parquet dictionary encoding)

2. Parquet dictionary encoding reduces size even more if many rows use exactly the same metadata dictionary bytes (corresponding objects have exactly the same set of field names)

12 of 21

Variant representation in Arrow

Metadata

len: ... buffer:0

offset: 0 prefix: ...

...

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

buffers[0]

Metadata

metadata Views

Note: Common metadata still only stored once

Similar efficient structure to Parquet physical layout, can reuse buffers. No conversion costs.

StructArray

{

metadata: BinaryView,

value: BinaryView

}

len: ... buffer:0

offset: 0 prefix: ...

len: ... buffer:0

offset: 0 prefix: ...

.metadata

.value

Schema

EXTENSION=”variant”

buffers[0]

values Views

...

len: ... buffer:0

offset: 0 prefix: ...

...

len: ... buffer:0

offset: 17 prefix: ...

len: ... buffer:0

offset: 56 prefix: ...

13 of 21

Variant Query: Filters (no Pushdown)

SELECT variant_column.last

FROM my_table

WHERE variant_column.org = ‘CMU’;

Scan

Table: my_table

Projection: variant_column

Filter

variant_get(variant_column, ‘org’) = ‘CMU’

DataFusion Plan

2. Filter is applied row by row: field is extracted and tested 🐢

Projection

Expr: variant_get(variant_column, ‘last’)

1. Plan reads entire variant value and passes it along 🐢

Note: New variant_get kernel for extracting variant fields

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

3. Extract “last” field🐢

14 of 21

Variant Query: Filters (Pushdown, no shredding)

SELECT variant_column.last

FROM my_table

WHERE variant_column.org = ‘CMU’;

Scan

Table: my_table

Filter: variant_get(variant_column, ‘org’) = ‘CMU’

Projection: variant_get(variant_column, ‘last’) as last_col,

Projection

Expr: last_col

DataFusion Plan

Filter and projection pushed into the scan, only needed data is copied

Still need to process each row 🐢

Notes:

  1. Needs both filter pushdown on and (not yet available) projection pushdown

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

}

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

}

15 of 21

Enter: Variant “Shredding”

Variant: don’t know (all) columns up front🤔 or there are too many (100,000)

How to get columnar storage benefits:

  • Fast Predicate evaluation: evaluate columns (fields) accessed, skip data for other fields
  • Fast Projection: Only fetch (IO) and decode (CPU) columns you need
  • Good Compression: homogeneous types separated in columns

Variant “Shredding”

Solution ⇒

16 of 21

Variant Shredding in Parquet

GROUP

{

metadata: binary,

value: binary,

typed_value: GROUP {

last: string

org: string

}

}

[

{

"first": "Andrew",

"last": "Lamb",

"org": "InfluxData",

},

{

"first": "Paul",

"last": "Dix",

"org": "InfluxData",

},

...more here...

{

"first": "Jiaying",

"last": "Li",

"org": "CMU",

},

]

JSON encoding

Parquet Storage (4 leaves)

Metadata

Metadata

Metadata

...

Value

(Andrew…)

Value

(Paul…)

Value

(Jiaying…)

.metadata

.value

See spec here: VariantShredding.md

Some (not all) fields are “shredded”

“Lamb”

“Dix”

“Li”

...

...

.typed_value

.last

“InfluxData”

“CMU”

...

.typed_value

.org

“InfluxData”

Parquet Schema

And stored as separate columns

17 of 21

Variant Query: Filters (Pushdown, with perfect shredding)

SELECT variant_column.last

FROM my_table

WHERE variant_column.org = ‘CMU’;

Scan

Table: my_table

Filter: v.org = ‘CMU’

Projection: v.last as last_col,

Projection

Expr: last_col

DataFusion Plan

(shredded file)

In this case, query never accesses metadata / value columns, only structured columns

Scan

Table: my_table

Filter: variant_get(variant_column, ‘org’) = ‘CMU’

Projection:

variant_get(variant_column, ‘last’) as last_col,

Projection

Expr: last_col

DataFusion Plan

(un shredded file)

Filter and projection refer to shredded columns 🐇

18 of 21

Thank You!

🎣

Come Join Us!

19 of 21

FAQ:

Question: How is Variant different than BSON?

  • More efficient (e.g. can share same field names across multiple objects in the same column)
  • Integrated into Parquet (including Shredding)
  • Designed to be a standard
  • Same name (different implementation) long popular with in SnowFlake

Question: How is Variant different than Struct datatype?

  • You do not need to know the schema (possible fields) up front

Question: How is Variant different than the Map datatype?

  • Parquet has no map type.
  • Arrow has a map type (docs), but all keys must be the same type and all values must be the same type (encoded as List(Struct<key: T1, value: T2))

20 of 21

Variant Query: Projection

SELECT variant_column.last || ‘,’ || variant_column.first

FROM my_table;

Scan

Table: my_table

Projection: variant_column

Projection

Expr: concat(

variant_get(variant_column, ‘last’),

‘,’,

variant_get(variant_column, ‘first’)

)

DataFusion Plan

Plan reads the entire variant value and passes it along 🐢

New variant_get kernel for extracting variant fields

21 of 21

Variant Query: Projection (Expression Pushdown)

SELECT variant_column.last || ‘,’ || variant_column.first

FROM my_table;

Scan

Table: my_table

Projection:

variant_get(variant_column, ‘last’) as last_col,

variant_get(variant_column, ‘first’) as first_col,

Projection

Expr: concat(last_col, ‘,’, first_col)

DataFusion Plan

Final expression references computed columns

Notes:

  • Helps StructArray access more than variant due to deferred variant decoding. Also may help Vortex and other formats and other formats that have fast internal casting
  • Critical for Shredding pushdown (see later)

Expressions with variant_get are pushed into to the scan