Automating Ambiguity:
Managing dynamic source data using dbt macros
Eric Nelson, Analytics & Engineering Lead
Agenda
Background
Job Function
Eric Nelson,
Analytics Engineer @ Mattermost
Problem
Problem
(@ Mattermost)
(or does it..?)
Our Tech Stack
Rudderstack CDP
Rudderstack is a customer data platform (similar to Segment or Bloomreach)
Solution:
dbt Macro
One dbt Macro to Rule Them All
Union Macro Logic
1.
Check if incremental build
2.
3.
ALTER TABLE *master_table*
ADD COLUMNS
*missing_column_1* *data_type*,
*missing_column_2* *data_type*,
...
*missing_column_n* *data_type*
;
4.
Execute Unioning Script (See Next Slide)...
2a.
Union Macro Diagram
What’s the Big Deal?
Model File
get_source_target_relation_lists() Arguments:
union_relations() Arguments:
Retrieve Source & Target Relations
Retrieve source & target relation lists.
Check For Missing Columns
Generate & Execute Union Script
Generate & Execute Union Script
Current Limitations & Future Additions
Only supports Snowflake, Postgres & Redshift DB
Method to incorporate additional transformation logic for union script
Method to unify columns and column naming conventions
Q & A
Thank You!
Appendix