1 of 4

Disclosure Control Support:�Internal Data Provenance

Dr Stuart Wheater

Arjuna Technologies

2 of 4

Issue to Address

No indication of provenance of data being operated upon

  • Help ensure assumptions behind Governance checks are valid
    • The input to the method was produced by a previous assign method

A <- analysisPhase1DS(“D”)� . . . unwanted modification to “A”� return analysisPhase2DS(“A”)

    • Data is from method from another package��A <- dsA::analysisDS(“D”)�. . . unwanted modification to “A”�return dsB::analysisDS(“A”)

3 of 4

Data Provenance

Use R object attributes to store provenance about data

  • “dsProvenance”

Checking Provenance of Data

  • checkProvenanceTags(data, c(“injested”, “fromAnalysisX”))
    • Will “stop(...)” if data’s “dsProvenance” attribute isn’t “injested” or “fromAnalysisX”
    • Called at start of a method

Specifying Provenance of Data

  • setProvenanceTags(data, c(“fromAnalysisX”))
    • Will set provenance information on data
    • Called at end of an assign method�
  • Set “dsProvenance” attribute to “ingested” on newly loaded tables

4 of 4

Next Steps

Create a package “dsDisclosure”

  • Move “listDisclosureSettingsDS” out of dsBase
  • Move disclosure option’s defaults out of dsBase
  • Move “checkPermissivePrivacyControlLevel” out of dsBase
  • Deprecate in 6.4.0

Prototype in 6.4.0

  • Implement “checkProvenanceTags” and “setProvenanceTags” (simple)
  • Now to set “ingested” attributed for newly loaded tables?

Future

  • Integrate into infrastructure
    • Enforce isolation between packages
    • Verify ingestion data against governance expectations

Engage with Community