1 of 32

BOM

Towards a verifiable artifact tree� & enabling launch-time security scanning

Contributors:� - Ed Warnicke� - Frederick Kautz� - Aeva Black

2 of 32

What is an SBOM

Metadata

Artifact ID

Artifact ID

Identifies an

  • executable
  • .o
  • .c
  • .h
  • .java
  • .class
  • .py
  • .go
  • .a
  • .so
  • container

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Information about the artifact like:

  • vendor
  • release version
  • contact information
  • license
  • copyright

3 of 32

Several competing standards

4 of 32

Learning from Git: Git Objects

${type}${size}\0

${content}

${type} - Git Object Type as a string

  • blob - any []byte
  • tree - represents a files system tree
  • commit - represents a commit

${size} - size in bytes of ${content} represented as a string base 10.

${content} - []byte of the content

5 of 32

Learning from Git: Git Ref

${type}${size}\0

${content}

sha1

${gitref}

Example ${gitref}:

  • cbebf91582d3d8aaa805c823bdd02b7756ea72c0
  • 40 characters in hex
  • 20 bytes (160 bits)

6 of 32

Learning from Git: Every file in a git repo is a ‘blob’

blob${size}\0

${content}

${content} - []byte of the file contents

  • Does not include filename or path
  • Does not include mode information
  • Does not include *any* metadata
  • Just the contents
  • Any file anywhere with the same contents will have the same ‘blob’ object
  • Any file anywhere with the same contents will have the same git ref

7 of 32

Every Artifact is a blob

blob${size}\0

${content}

Since every artifact is a []byte array, every artifact is a blob

  • Every artifact can be identified with its git ref
  • The leaf artifacts in every artifact tree are source files
  • Source files are stored in git repos
  • The leaf artifacts are *already* identified and indexed by git ref.

8 of 32

Separate Metadata from Artifact Tree

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

Metadata

Artifact ID

9 of 32

Separate Metadata from Artifact Tree

Metadata

Artifact ID

Artifact ID

Artifact ID

Artifact ID

Artifact ID

Artifact ID

Artifact ID

Metadata

Metadata

Metadata

Metadata

Metadata

Metadata

Metadata

10 of 32

Examples of Artifacts Trees

11 of 32

Examples of Artifacts Trees

12 of 32

Examples of Artifacts Trees

13 of 32

Examples of Artifacts Trees

14 of 32

Examples of Artifacts Trees

15 of 32

Examples of Artifacts Trees

16 of 32

Use the Git Ref as the Artifact ID

Artifact Git Ref

Artifact Git Ref

Artifact Git Ref

Artifact Git Ref

Artifact Git Ref

Artifact Git Ref

Artifact Git Ref

17 of 32

Represent the child relationship using a ‘GitBOM’ doc

Artifact-1 Git Ref

Artifact-2 Git Ref

Artifact-3 Git Ref

Artifact-4 Git Ref

Artifact-5 Git Ref

Artifact-6 Git Ref

Artifact-7 Git Ref

blob${size}\0

blob${a-6 git ref}\n

blob${a-7 git ref}\n

Artifact-3’s GitBOM

blob${size}\0

blob${a-2 git ref}bom${a-2’s GitBOM git ref}\n

blob${a-3 git ref}bom${a-3’s GitBOM git ref}\n

Artifact-1’s GitBOM

blob${size}\0

blob${a-4 git ref}\n

blob${a-5 git ref}\n

Artifact-2’s GitBOM

Lexical order

18 of 32

Metadata can reference artifacts or GitBOMs by Git Ref

Artifact-1 Git Ref

Artifact-2 Git Ref

Artifact-3 Git Ref

Artifact-4 Git Ref

Artifact-5 Git Ref

Artifact-6 Git Ref

Artifact-7 Git Ref

Metadata

Metadata

Metadata

Metadata

Metadata

19 of 32

Compatibility with other SBOM approaches

Metadata

20 of 32

Embed the Git Ref of the GitBOM into the artifact

  • ELF Files (Executables and .so, and .o files)
    • Embed GitBOM identifier into an elf section named ‘.bom’

  • ar Files (.a static libraries)
    • Embed GitBOM identifier into an archive entry named ‘.bom’

  • General Archive files (tar,gzip,etc)
    • Embed GitBOM identifier into an archive entry named ‘.bom’

  • Java class file
    • Embed GitBOM identifier into an annotation named @BOM in the .class file.

  • Python .pyc files
    • Embed GitBOM identifier into an __bom__ in the .pyc file.

  • Container Images
    • Embed GitBOM identifier into the image manifest as an annotation named “dot.bom”

21 of 32

Compiler/Linker Integration

  • Compiler/Linker authoritatively know what child artifacts are built into a new artifact
  • Compiler/Linker are best positioned to ‘embed’ into the artifact being built
  • Compiler/Linker could write a .bom/object/ directory to store the GitBOMs for any children in the same directory the artifact is being written
    • Git stores its objects in .git/object - so similar
  • Subsequent compiler/link steps could look for .bom in the directory they are reading artifacts from
  • Given the git ref of an artifact with embedded GitBOM git ref the artifact tree information cannot be tampered with without detection.
    • Tampering with the artifact - say to change its embedded GitBOM git ref would change the artifacts git ref
    • Tampering with the GitBOM would change its git ref

22 of 32

Distributing GitBOMs

23 of 32

Vulnerability Tracing

  • A CVE could be tracked back to the source files that cause it
    • Each of those source files has a git ref
    • You could look for those Git Refs in GitBOMs to determine which artifacts may be impacted
    • You could make declarations that particular artifacts (like executables)
      • Do not express vulnerability or
      • Have a vulnerability mitigation
      • By Git Ref
      • Such declarations essentially become ‘metadata’

24 of 32

Other uses

  • Attestation
  • Forensics
  • Repeatable/Verifiable build:
    • Build information is ‘metadata’ about the artifact tree
    • Build information and/or GitBOMs could be signed

25 of 32

BOM

Towards a verifiable artifact tree� & enabling launch-time security scanning

26 of 32

Example1:

blobc0f35b8ae567f5348df3711496fdc0ef6f634169\n

blobc64efd8bd8bceca8c69f9b5b7647cf0ff61fed59\n

Lexically ordered

85322091b1d50a23d1c2a0f5933788a2a958f2ad

gitref

Imagine we have:

hello.c -> gitref(hello.c) == c64efd8bd8bceca8c69f9b5b7647cf0ff61fed59

hello.c #includes <stdio.h> -> gitref(stdio.h) == c0f35b8ae567f5348df3711496fdc0ef6f634169

And we are building hello.o. The resulting GitBOM is:

We would write to hello.o a single additional elf section ‘.bom’ containing the 20 byte:

85322091b1d50a23d1c2a0f5933788a2a958f2ad

Overhead - total 89 bytes:

  • 64 bytes for section header
  • 5 bytes for adding ‘.bom\0’ to shstrtab
  • 20 bytes for 85322091b1d50a23d1c2a0f5933788a2a958f2ad

27 of 32

Example1:

Elf Header

Program Header Table

.text

.rodata

.shstrtab

.bom

...

Section Header Table

+ 20 bytes

+ 5 bytes

+ 64 bytes

28 of 32

Example1:

Write out to the same directory as hello.o:

./.bom/object/85/322091b1d50a23d1c2a0f5933788a2a958f2ad

The contents of hello.o’s GitBOM

Note: file size will be 46 bytes * number of .c and .h files. So for example, if we had 1 .c file and 999 .h files, the file size would be 46000 bytes.

Note: This is a file in the file system, it is *not* inserted into the .o file.

blobc0f35b8ae567f5348df3711496fdc0ef6f634169\n

blobc64efd8bd8bceca8c69f9b5b7647cf0ff61fed59\n

29 of 32

LLVM

FrontEnd

Pass

Pass

Pass

Pass

BackEnd

.c

IR

IR

IR

IR

IR

.o

1. Determine .c and .h files

2. Compute GitBOM document

3. Write out the GitBom document to the ./.bom/object directory

4. Add GitBOM gitref and file location as IR Metadata

1. Read GitBOM gitref from IR Metadata

2. Add .bom section to elf file containing only the gitref (20 bytes)

3. Read GitBOM file location from IR Metadata

4. Copy GitBOM file to ./.bom/object of output directory

30 of 32

Example2:

Imagine we have:

hello.o -> gitref(hello.o) == da2f5371ac5135d436b3dd3f2810c3c705cad1ea

goodbye.o -> gitref(goodbye.o) == a9c8ab2cc116562393fc675b2b4dede22f845967

And we are building a greeting executable:

llvm-ld -o ${GREETING_DIR}/greeting ${HELLO_DIR}/hello.o ${GOODBYE_DIR}/goodbye.o

The linker looks for a .bom section in hello.o, finds it, and it contains:

85322091b1d50a23d1c2a0f5933788a2a958f2ad

The linker looks for a .bom section in goodbye.o, finds it, and it contains:

34a7ad58295540383be53114b8c6ca3b98611a75

The linker computes the GitBOM for the greetings executable:

bloba9c8ab2cc116562393fc675b2b4dede22f845967bom34a7ad58295540383be53114b8c6ca3b98611a75\n

blobda2f5371ac5135d436b3dd3f2810c3c705cad1eabom85322091b1d50a23d1c2a0f5933788a2a958f2ad\n

0f4f259bb0fc79aaeea37598b0fda9fef0c2efea

gitref

31 of 32

Example2:

We would write to the greetings executable a single additional elf section ‘.bom’ containing the 20 byte:

0f4f259bb0fc79aaeea37598b0fda9fef0c2efea

Overhead - total 89 bytes:

  • 64 bytes for section header
  • 5 bytes for adding ‘.bom\0’ to shstrtab
  • 20 bytes for 0f4f259bb0fc79aaeea37598b0fda9fef0c2efea

bloba9c8ab2cc116562393fc675b2b4dede22f845967bom34a7ad58295540383be53114b8c6ca3b98611a75\n

blobda2f5371ac5135d436b3dd3f2810c3c705cad1eabom85322091b1d50a23d1c2a0f5933788a2a958f2ad\n

Write out to the same directory as the greetings executable (${GREETING_DIR}):

./.bom/object/0f/4f259bb0fc79aaeea37598b0fda9fef0c2efea

The contents of the greeting executables GitBOM

Note: file size will be 90 bytes * number of .o files. So for example, if we had 1000 .o files, the file size would be 90000 bytes.

Note: This is a file in the file system, it is *not* inserted into the executable file

32 of 32

Example3: Container Image Integration

{

"schemaVersion": 2,

"config": {

"mediaType": "application/vnd.oci.image.config.v1+json",

"size": 7023,

"digest": "sha256:b5b2b2c507a0944348e0303114d8d93aaaa081732b86451d9bce1f432a537bc7"

},

"layers": [

{

"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",

"size": 32654,

"digest": "sha256:9834876dcfb05cb167a5c24953eba58c4ac89b1adf57f28f2f9d09af107ee8f0"

},

{

"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",

"size": 73109,

"digest": "sha256:ec4b8955958665577945c89419d1af06b5f7636b4ac3da7f12184802ad867736"

}

],

"annotations": {

"gitbom”: “sha256:3c3a4604a545cdc127456d94e421cd355bca5b528f4a9c1905b15da2eb4a4c6b”

}

}