MapVector Design
Prepared by:
Bohdan Kazydub
DRILL-7096
Motivation
Currently Drill has MAP type which corresponds to Hive’s Struct. To be able to read (Hive’s) MAP columns from Hive[1] (as part of Hive complex types support implementation) there is a need to introduce new type to drill - (canonical) MAP which will store key-value pairs. Existing MAP type will be renamed to STRUCT (in scope of [1]).
[1] see https://issues.apache.org/jira/browse/DRILL-3290 for more info
MapVector Structure[1]
New map vector will contain three value vectors: keys, values and offsets.
[1] the issue is tracked in https://issues.apache.org/jira/browse/DRILL-7096
Element count in map is obtained as difference between next offset value and current one:
size[i] = offsets.get(i + 1) - offsets.get(i)
For example, to the left is presented a scheme for simple maps (each map corresponds to a row)
{1, ‘a’, 2, ‘b’, 5, ‘e’}
{2, ‘b’, 5, ‘e’, 7, ‘g’, 3, ‘c’}
{} (empty map)
{4, ‘d’}
...
MapWriter
To be able to modify MapVector one needs to use MapWriter. MapWriter encapsulates logic to separate rows and individual records (map elements) in the rows.
Fields description
Methods description:
MapReader
To be able to access values by key, the reader introduces read(Object key, ValueHolder holder) and read(Object key, ComplexHolder holder) methods which set values into given holder.
This method is going to be used in generated code similarly as is done for obtaining values from arrays by index. To get value by key, one should use square brackets syntax:
SELECT mapcol_with_int_key[25] FROM hive.table_name;
SELECT mapcol_with_string_key[‘abc’] FROM hive.table_name;
(It is possible to use UDFs inside square brackets such as CAST, LOWER etc.)
Issues
Future improvements
Q&A
Thanks for attention!