|Copyright © 2020 Threat Stack, Inc.|
|Diagram w/ transitions||Project Definition||Data Exploration||Functional Implementation||Pre-Production Testing||Production Deploy & Feedback|
|Description||Pre-hypothesis phase where you likely have product requirements ("find abnormal process execution") and domain experience ("we've tried this other ways before")||Developing a hypothesis by analyzing production data. Requires data science and subject matter expertise.||Depending on the maturity of the iteration this may be a small test to a full, data engineering implementation of final product capability.|
Kicks off standard SDLC.
|Standard SDLC.||Broken into two phases:|
A) Dark shipped capability for short term feedback. Leverages full production data set, learning before being exposed in phase B.
B) Operational capability that is fully productionalized and exposed.
|Goals||1. What are you doing and why does it require ML?|
2. Why does your approach require production data? (scale and entropy of production data vs synthetic testing data, and/or analysis is sensitive to privacy methods like hashing)
|1. Develop a hypothesis that will become product functionality ("I think X cluster analysis is a good approach")|
2. Justify the use of specific data sets and why they should be mixed
|1. Product development velocity||1. Ensure data safety of implementation with integration tests, small scale load tests, etc.|
2. Early model feedback with mock data (if available) to quickly identify model or implementation flaws
|A.1) Does it scale? Data engineering.|
A.2) Expected feedback in short, potentially unrealistic time window to achieve confidence for go / no-go decision for phase B.
B.1) Lagging indicator feedback
B.2) Identify future iterations and follow on hypothesis.
B.2) Determine medium-to-long term efficacy of model.
|SOC 2 CC8 States||On Exit|
- Tested (operational)
- Documented (operational)
- Configured (operational)
- Approved (operational)
|On Phase A -> B Transition|
- Tested (hypothesis, short term)
- Configured (dark ship = false)
- Approved (for user facing)
Phase B Monitoring (on-going)
- Testing with long tail metrics
- Documentation of effects for future iterations and potential rollbacks (ex., user feedback erodes model efficacy)
R/O access to production data, exposing only necessary data. Tightly controlled and project member access only. Hangs off production but isn't production - this will likely be debated if B2B (customers, auditors) or highly regulated business.
|Eng Workstation ... Development||Development ... QA|
Production builds of all services, but no production data.
A) Operationally runs in production and could impact production confidentiality, integrity, and availability but MUST NOT feed back into the product, be end user facing, etc.
B) Fully deployed and exposed to users. Incremental tuning without releases are to be treated as production configuration events, not software or model releases.
|Max Data Sensitivity|
High - production / customer
Medium - quasi-identifiers (anonymized High data)
Low - implementation details (IP)
|Low to High|
Project is likely based on product feedback and BI, but may not require raw material access.
|Medium to High|
Does not store production data but may store intermediary views containing High data, like incremental indices or groupings, on semi-persistent storage. Inherently has intellectual property.
Standard development with mock data, stubbed interfaces, etc.
|Low to Medium|
Depending on data scale and algorithm, might be possible to use anonymized High data but most likely only integration tests.
|Likelihood depends on the model and its use, implementation of environment and controls, and type of data.|
Not in scope: generic production and SDLC controls/risks, application specifics, attacks on algorithms and data that don't intersect with our change management process
|Assumes tightly scoped access to data science team and potentially some engineers. PMs and managers would only have output view of their work - ex., dashboard and reports based on work, screenshots, discussing results in chat/email/meetings.||Transitioning from hypothesis development to implementation of test or product capability.||Assumes no High data in pre-production environments.||Feedback may be delivered thru produciton operational metrics (ex., convertion rates thru workflows) or back in the data room.|
|Spoofing||- Insider running production or test services in the data room, exposed internally or externally, for unapproved testing or product development. Ex., rogue PM or data scientist ("I can do this better than X person or team" / "I'll show them").|
|Tampering||- Tampering with intermediary results in the data room.|
- Tampering with data analysis tools that persist state in the data room, influencing downstream implementation or injecting code. Ex., jupyter notebooks.
- Tamper with production data due to faulty controls (ex., granting data room write access to production S3).
|- Lack of qualified machine learning experts to review model and its implementation. Especially as experts refer ex-colleagues who will likely be hired due to a small talent pool (collusion or influence). Ex., wanting to hide themselves from fraud analysis or influence trading patterns for personal gains thru external investment accounts.||- Did self mutating models change because of pre-production learnings or malicious influence?||- While monitoring feedback and tuning, insiders may tune for their benefit vs. business objectives. Config change reviews are potentially too slow or don't provide reviewer sufficient context to understand the tuning (assuming qualified to review the change).|
|Information disclosure||- Leak of High data into planning tools, meetings ("customer A doesn't do that"), and reports or analysis (product analytics).||- Outsider breach of environment, likely due to incomplete control application. Ex., control applied to production but forget to apply to data room.|
- Insider data theft. Ex., copying data out of the data room.
- Data mishandling. Ex., accident or poor training.
- Over sharing of information with the data room. Ex., data science team only requires metadata but is given all telemetry and customer data.
|- If models are multi tenant, then leak information between tenants thru quasi-identifiers ("other companies like yours" and there's only two) or accidental leakage ("exe X normally accepts traffic from Y address which is owned by ACME Co").|
A.1) Feature flag failure.
|Denial of service||- Overloading shared data storage infrastructure instead of data room specific read-only copy causing performance or availability issues.|
- Deleting or over writing production data due to misconfigured data room boundary (ex., granting data room write access to production AWS S3).
|B.1) Over medium-to-long term the model does not behave as expected based on short term testing in Phase A. This may not be obvious to end users, especially if operational metrics are not in place with alarms over the life of the deployment.|
|Elevation of privilege||- Cross data room boundary into production due to faulty controls (ex., IAM misconfig).|