1 of 11

Monthly OpenLineage

TSC meeting

Dec/08/2021

2 of 11

Recording of calls

Reminder:

The meeting is recorded and archived on the wiki

https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting

2

3 of 11

Roll Call

TSC voting members:

Julien Le Dem

Mandy Chessell

Daniel Henneberger

Drew Banin

James Campbell

Ryan Blue

Willy Lulciuc

Zhamak Dehghani

Michael Collado

Maciej Obuchowski

3

4 of 11

Communication

4

5 of 11

Agenda

  • SPDX headers [Mandy Chessel]
  • Azure Purview + OpenLineage [Will Johnson, Mark Taylor]
  • Logging backend (OpenTelemetry, ...) [Julien Le Dem]
  • Open discussion

5

6 of 11

Software Package Data Exchange (SPDX)

  • SPDX is an open standard for creating software bill of materials
  • SPDX includes a set of short identifiers that identify the different open source licenses.
    • Both human readable and machine processable
    • Easy to maintain and validate
  • Full license is added in LICENSE file at top of git repository
  • Each file includes the SPDX-License-Identifier tag

6

Replaces

7 of 11

Azure Purview + OpenLineage

  • Spark lineage in Azure Purview
  • Spark integrations for Azure connectors to Azure data sources like synapse and cosmos db

7

8 of 11

OpenLineage on Azure PaaS

8

More to Come!

  • I: 333 | PR: 375 (Init Script)✅
  • I: 407 | PR: 425 (url param)❓
  • I: 181 Extensibility ❓
  • Databricks Environment Data❓
  • Cosmos DB (DataSourceV2)❓
  • Synapse (SqlDWRelation)❓

9 of 11

Logging backends

  • OpenTelemetry

9

10 of 11

Roadmap:

  • OpenLineage Roadmap

10

11 of 11

Open Discussion

11