Introduction to Iceberg / Parquet Variant
x
July 10 2025: Apache Iceberg Meetup, NYC
Andrew Lamb
Staff Engineer, InfluxData
Andrew Lamb
Enterprise Software: 22 years
Databases: Oracle + Vertica (Optimizer) + InfluxData + Apache DataFusion
ML / Engineering Leadership: Nutonian + DataRobot
Compilers: DataPower/IBM
Staff Engineer
TIL: Variant
Variant: efficient semi-structured data (e.g. JSON) in Iceberg and Parquet
semi_column (variant) |
{ "first": "Andrew", "last": "Lamb", }, |
“foo” |
{ cities : [“Boston”, “NYC”] } |
[1.0, “N/A”, 2.3, {“k”: “v”}] |
id (integer) |
1 |
2 |
3 |
4 |
Apache Parquet file with two top level columns: id, and semi_column
Structure varies from row to row
Arbitrary nesting
List elements can be different types
Benefits of Variant
Diagram source: Introducing the Open Variant Data Type in Delta Lake and Apache Spark | Databricks Blog
“Overall, the VARIANT data type streamlines working with semi-structured data in Databricks, enhancing flexibility, performance, and ease of use.
Snowflake has long offered the VARIANT data type, allowing you to store semi-structured data without pre-defining a schema. This eliminates the burden of schema design upfront.
In contrast, for Delta Lake previously we relied on the MAP data type, which requires a defined schema. However, semi-structured data often exhibits schema variations across rows, creating challenges for data engineers. Parsing the data correctly before storage was a necessary but tedious step.”
More about Variant
New Data Type: Parquet (almost finalized), Iceberg v3, and Spark 4.0
Semi-structured: Schema can vary row to row
Superset of JSON: JSON can be losslessly* round tripped through Variant
Efficient Binary Encoding:
⇒ Optimized for (common) case where most objects have the same structure, similarly to document databases
More Details:
* The logical content is captured losslessly, but not physical formatting (e.g. whitespace)
Kudos: Open Source Takes a Village 🛖
See [EPIC] [Parquet] Implement Variant type support in Parquet · Issue #6736 for more details
Contributors and affiliations to the Rust Variant implementation (ongoing)
How is Variant different than Struct and List?
struct_col Struct { name: String } |
{ "name": "Andrew", } |
{ "name": "Jiaying" } |
{ "name": 42 } |
All struct fields are part of the type, must be known
list_col List(Float64) |
[1.0, 2.0, 3.0] |
[] |
[4.5] |
[“foo”] |
All elements of a list must have the same type
Variant Encoding
Array
...
metadata
...
value
Variants stored as two separate binary arrays: metadata and value
...
first
...
last
...
[
{
"first": "Andrew",
"last": "Lamb",
},
{
"first": "Jiaying",
"last": "Li",
},
]
�
JSON encoding
Variant encoding
header
dictionary
array header
array length
object header
field name (pointer)
string header
string value
field name (pointer)
string header
string value
object header
field name (pointer)
string header
string value
field name (pointer)
string header
string value
...
11
2
Andrew
...
23
Lamb
...
...
11
Jiaying
...
23
Li
...
Object 1
Object 2
Supported Variant Types
Variant Physical Type | Equivalent Parquet Type |
null | UNKNOWN |
boolean (True) | BOOLEAN |
boolean (False) | BOOLEAN |
int8 | INT(8, signed) |
int16 | INT(16, signed) |
int32 | INT(32, signed) |
int64 | INT(64, signed) |
double | DOUBLE |
decimal4 | DECIMAL(precision, scale) |
decimal8 | DECIMAL(precision, scale) |
decimal16 | DECIMAL(precision, scale) |
date | DATE |
Variant Physical Type | Equivalent Parquet Type |
timestamp | TIMESTAMP(isAdjustedToUTC=true, MICROS) |
timestamp without time zone | TIMESTAMP(isAdjustedToUTC=false, MICROS) |
float | FLOAT |
binary | BINARY |
string | STRING |
time without time zone | TIME(isAdjustedToUTC=false, MICROS) |
timestamp with time zone | TIMESTAMP(isAdjustedToUTC=true, NANOS) |
timestamp without time zone | TIMESTAMP(isAdjustedToUTC=false, NANOS) |
uuid | UUID |
Variant Storage in Parquet
Logical Type: VARIANT
Physical Type:
GROUP
{
metadata: binary,
value: binary
}
Non shredded Variant values are stored as a struct with two fields: 1) metadata, 2) value
{
"first": "Andrew",
"last": "Lamb",
"org": "InfluxData",
}
{
"first": "Paul",
"last": "Dix",
"org": "InfluxData",
}
...more here...
{
"first": "Jiaying",
"last": "Li",
"org": "CMU",
}
JSON encoding
Parquet Storage
(2 leaves)
Metadata
Metadata
Metadata
...
Value
(Andrew…)
Value
(Paul…)
Value
(Jiaying…)
.metadata
.value
...
Parquet Schema
Variant Storage in Parquet: Dictionary + RLE
💡Common Case: repeated metadata (when objects have the same field names) can be efficiently stored using standard Parquet Dictionary / RLE encoding
{
"first": "Andrew",
"last": "Lamb",
"org": "InfluxData",
}
{
"first": "Paul",
"last": "Dix",
"org": "InfluxData",
}
...more here...
{
"first": "Jiaying",
"last": "Li",
"org": "CMU",
}
JSON encoding
Parquet Dictionary Encoding
Metadata
0
...
dictionary
0
0
Metadata
Indexes into dictionary
Repeated metadata only stored once
.metadata
Value
(Andrew…)
Value
(Paul…)
Value
(Jiaying…)
.value
Two levels of dictionary encoding:
1. metadata itself is dictionary encoding of field names (opaque to Parquet dictionary encoding)
2. Parquet dictionary encoding reduces size even more if many rows use exactly the same metadata dictionary bytes (corresponding objects have exactly the same set of field names)
Variant representation in Arrow
StructArray + Canonical Extension Type
[Format] Add a Canonical Variant Extension Type
[Format] Consider adding an official variant type to Arrow
Metadata
len: ... buffer:0
offset: 0 prefix: ...
...
Value
(Andrew…)
Value
(Paul…)
Value
(Jiaying…)
buffers[0]
Metadata
metadata Views
Note: Common metadata still only stored once
Similar efficient structure to Parquet physical layout, can reuse buffers. No conversion costs.
StructArray
{
metadata: BinaryView,
value: BinaryView
}
len: ... buffer:0
offset: 0 prefix: ...
len: ... buffer:0
offset: 0 prefix: ...
.metadata
.value
Schema
EXTENSION=”variant”
buffers[0]
values Views
...
len: ... buffer:0
offset: 0 prefix: ...
...
len: ... buffer:0
offset: 17 prefix: ...
len: ... buffer:0
offset: 56 prefix: ...
Variant Query: Filters (no Pushdown)
SELECT variant_column.last
FROM my_table
WHERE variant_column.org = ‘CMU’;
Scan
Table: my_table
Projection: variant_column
Filter
variant_get(variant_column, ‘org’) = ‘CMU’
DataFusion Plan
2. Filter is applied row by row: field is extracted and tested 🐢
Projection
Expr: variant_get(variant_column, ‘last’)
1. Plan reads entire variant value and passes it along 🐢
Note: New variant_get kernel for extracting variant fields
{
"first": "Andrew",
"last": "Lamb",
"org": "InfluxData",
}
...more here...
{
"first": "Jiaying",
"last": "Li",
"org": "CMU",
}
3. Extract “last” field🐢
Variant Query: Filters (Pushdown, no shredding)
SELECT variant_column.last
FROM my_table
WHERE variant_column.org = ‘CMU’;
Scan
Table: my_table
Filter: variant_get(variant_column, ‘org’) = ‘CMU’
Projection: variant_get(variant_column, ‘last’) as last_col,
Projection
Expr: last_col
DataFusion Plan
Filter and projection pushed into the scan, only needed data is copied
Still need to process each row 🐢
Notes:
{
"first": "Andrew",
"last": "Lamb",
"org": "InfluxData",
}
...more here...
{
"first": "Jiaying",
"last": "Li",
"org": "CMU",
}
Enter: Variant “Shredding”
Variant: don’t know (all) columns up front🤔 or there are too many (100,000)
How to get columnar storage benefits:
Variant “Shredding”
Solution ⇒
Variant Shredding in Parquet
GROUP
{
metadata: binary,
value: binary,
typed_value: GROUP {
last: string
org: string
}
}
[
{
"first": "Andrew",
"last": "Lamb",
"org": "InfluxData",
},
{
"first": "Paul",
"last": "Dix",
"org": "InfluxData",
},
...more here...
{
"first": "Jiaying",
"last": "Li",
"org": "CMU",
},
]
JSON encoding
Parquet Storage (4 leaves)
Metadata
Metadata
Metadata
...
Value
(Andrew…)
Value
(Paul…)
Value
(Jiaying…)
.metadata
.value
See spec here: VariantShredding.md
Some (not all) fields are “shredded”
“Lamb”
“Dix”
“Li”
...
...
.typed_value
.last
“InfluxData”
“CMU”
...
.typed_value
.org
“InfluxData”
Parquet Schema
And stored as separate columns
Variant Query: Filters (Pushdown, with perfect shredding)
SELECT variant_column.last
FROM my_table
WHERE variant_column.org = ‘CMU’;
Scan
Table: my_table
Filter: v.org = ‘CMU’
Projection: v.last as last_col,
Projection
Expr: last_col
DataFusion Plan
(shredded file)
In this case, query never accesses metadata / value columns, only structured columns
Scan
Table: my_table
Filter: variant_get(variant_column, ‘org’) = ‘CMU’
Projection:
variant_get(variant_column, ‘last’) as last_col,
Projection
Expr: last_col
DataFusion Plan
(un shredded file)
Filter and projection refer to shredded columns 🐇
Thank You!
[EPIC] [Parquet] Implement Variant · Issue #6736 · apache/arrow-rs
https://github.com/apache/arrow-rs
https://datafusion.apache.org/
🎣
Come Join Us!
FAQ:
Question: How is Variant different than BSON?
Question: How is Variant different than Struct datatype?
Question: How is Variant different than the Map datatype?
Variant Query: Projection
SELECT variant_column.last || ‘,’ || variant_column.first
FROM my_table;
Scan
Table: my_table
Projection: variant_column
Projection
Expr: concat(
variant_get(variant_column, ‘last’),
‘,’,
variant_get(variant_column, ‘first’)
)
DataFusion Plan
Plan reads the entire variant value and passes it along 🐢
New variant_get kernel for extracting variant fields
Variant Query: Projection (Expression Pushdown)
SELECT variant_column.last || ‘,’ || variant_column.first
FROM my_table;
Scan
Table: my_table
Projection:
variant_get(variant_column, ‘last’) as last_col,
variant_get(variant_column, ‘first’) as first_col,
Projection
Expr: concat(last_col, ‘,’, first_col)
DataFusion Plan
Final expression references computed columns
Ticket: Support Push down expression evaluation in `TableProviders` · Issue #14993 · apache/datafusion
Notes:
Expressions with variant_get are pushed into to the scan