Sharing is caring:�Open data for open source compliance�
Philippe Ombredanne, AboutCode�Qing Tomlinson, SAP
Philippe Ombredanne
pombredanne@aboutcode.org�https://github.com/pombredanne�https://www.linkedin.com/in/philippeombredanne
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
2 / 28
Qing Tomlinson
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
3 / 28
SBOMs Everywhere for Compliance and Security
The EU Cyber Resilience Act (CRA) and other regulations mandate comprehensive SBOMs, continuous vulnerability management, and supply chain transparency, creating significant operational and technical burdens.
Organizations face great challenges to generate SBOMs at scale for each stage on the supply chain, for every build or release, with accurate data.
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
4 / 28
Wasted Resources 🤦🏽🤦🏻♀️🤦🏼♂️
Organizations fix the same missing or wrongly identified FOSS package metadata and license over and over again. It is a waste of compute and human resources to rescan and reanalyze the same packages.
Duplicated compliance efforts should instead be coordinated, shared, and reused to improve efficiency for everyone.
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
5 / 28
FOSS for FOSS
Many proprietary or commercial databases (claim to) offer accurate metadata for software packages.
Data about open source packages must also be open.
Anything else would be plain crazy dumb.
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
6 / 28
Community Solutions for Software Metadata
Other approaches: OSADL OSSelot, ORT Corrections and Curations, Libraries.io
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
7 / 28
Bringing clarity to�Open Source Software�metadata and beyond.
Use the Data | Curate Data | Contribute Data | Contribute Code | Add a Harvest | Adopt Practices |
ClearlyDefined Clearly Explained 🤔
1) Crowdsourced licensing metadata for every software component ever in a global database published for all to use
2) Cached copy of licensing metadata for each component with a simple API for integration and automation
3) Organizations contribute back with any missing or wrongly identified licensing metadata to improve accuracy
4) In a trusted non-profit: Open Source Initiative
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
9 / 28
Overview of ClearlyDefined Curation Process
clearlydefined/curated-data
Purpose: Repository of curations
service for ClearlyDefined
Purpose: Facilitate curating licenses
Purpose: API for accessing curations
website for ClearlyDefined
Purpose: Simplified view of definitions
Merge PR
API PATCH /curations
Notify
PR merged
curated-data
store
Write curation
create PR
Review PR
human review
human created PR
slide courtesy of E. Lynette Rayle (GitHub)
�Vidal, N., Rayle, E. L., & Tomlinson, Q. (2024). ClearlyDefined: A Crowdsourced Database of Licensing Metadata [PowerPoint slides]. SOSS Fusion 2024, Atlanta, Georgia, United States. https://opensource.org/wp-content/uploads/2024/11/ClearlyDefined-SOSS-Fusion-2024.pdf
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
10 / 28
ClearlyDefined Benefits
Organizations
Ecosystem
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
11 / 28
Problems: Scale ⛰️🌎🌌
1) Growing volume of data (70TB) can make it difficult to use ClearlyDefined on-premises
2) Also challenges with scanning at scale, because ScanCode is super fast! 🤣
3) Diagnostics and debugging on a distributed application
is hard 😅
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
13 / 28
Problems: Size...
55M+ packages
🪨🏋️♂️🪨🏋️♀️🪨🏋️
10,000+ curations!
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
14 / 28
Problems: Infrastructure
1) Database license switcheroo necessitates�migration from MongoDB to true FOSS�alternative like DocumentDB
2) Migrate to federated data for performance and�digital independence: unlock the data! 🔓
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
15 / 28
Problems: Data Contributions
1) Need to fix curation UI 🤢 to facilitate more contributions
2) Need more community contributors 👫🏋️👭🏋️♂️🧑🤝🧑🏋️♀️👬
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
16 / 28
Future (2026) Plans
1) PURL support - New PURL API to query
2) Unlock the data - Distributed and federated reuse
3) Share all the scans, never scan twice the same package
4) New curation UI
5) All the SBOMs for all the packages
6) Establish close collaboration with AboutCode
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
17 / 28
Future (2027 and Beyond) Plans: Data Domination
All the data are belong to you. 😜
1) Expand ClearlyDefined to include all the pillars of data for compliance. We already have provenance and license, but add security and project health/lifecycle.
2) ClearlySecured (cybersecurity curations) for the vulnerabilities use case.
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
18 / 28
Join the Community!
1) Weekly developer meetings: https://docs.clearlydefined.io/docs/community/meetings
�2) Hang out in Discord:
3) Contribute code and data:
https://docs.clearlydefined.io/docs/get-involved
4) Help sustain and grow ClearlyDefined!
Contact pombredanne@aboutcode.org and qing.tomlinson@sap.com for more information.
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
19 / 28
Any Questions?
pombredanne@aboutcode.org�https://github.com/pombredanne�https://www.linkedin.com/in/philippeombredanne
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
20 / 28
Use the data
API: definitions, curations, harvest, attachments, notices
curl -X GET "https://api.clearlydefined.io/definitions/npm/npmjs/-/lodash/4.17.21" -H "accept: */*"
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
22 / 28
Curate Data
"contributionInfo": {
"summary": "[Test] Update declared license",
"details": "The declared license should be Apache as per the LICENSE file.",
"resolution": "Updated declared license to Apache-2.0.",
"type":"incorrect",
"removeDefinitions":false
},
https://docs.clearlydefined.io/docs/get-involved/data-curation
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
23 / 28
Contribute Data
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
24 / 28
Contribute Code
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
25 / 28
Add a Harvest
© Open Source Initiative / ClearlyDefined / CC-BY-4.0 / clearlydefined.io
26 / 28