1 of 21

wasm-ipld: IPLD

Lessons from building an IPLD implementation

2022.07.13

2 of 21

Agenda

wasm-ipld: IPLD

Building blocks

Codecs and the IPLD Data Model

WAC Data Model struggles

What’s an ADL anyway?

What’s missing/next?

3 of 21

What and why wasm-ipld?

Building Blocks

  • wasm-ipld, a portable set of IPLD tools
  • I don’t want to rewrite codecs, ADLs, etc. in Go, Rust, JS, Python
    1. UnixFS has to be in all these places, but what if we wanted to support something else… say BitTorrent, Git, etc.? The work is … a lot.
  • Bonus if code is written in a language that can build importable FFI binaries (e.g. Rust, C, etc.) for environments where WASM isn’t acceptable but shared libraries are
  • Currently supports
    • Codecs
    • ADLs

4 of 21

Agenda

wasm-ipld: IPLD

Building blocks

Codecs and the IPLD Data Model

WAC Data Model struggles

What’s an ADL anyway?

What’s missing/next?

5 of 21

What’s a Codec?

Codecs and the IPLD Data Model

  • “IPLD codecs are functions that transform IPLD Data Model into serialized bytes so you can send and share data, and transform serialized bytes back into IPLD Data Model so you can work with it.”
    • Encode(Data Model Representation) -> (bytes, error)
    • Decode(bytes) -> (Data Model Representation, error)

6 of 21

What’s the IPLD Data Model?

Codecs and the IPLD Data Model

  • IPLD Data Model ~ a specific collection of data types (bool, integer, bytes, list, map, …)
  • Why does the data model exist?
    • So developers can target it as an abstraction over multiple chunks of data without needing to care too much about particulars.
      • e.g. I want to build tooling to inspect both git objects and bittorrent files, or similarly I want to list data in both JSON and CBOR maps.
  • Ok… but what is it?!?
    • Good question, let’s try and answer it

7 of 21

Agenda

wasm-ipld: IPLD

Building blocks

Codecs and the IPLD Data Model

WAC Data Model struggles

What’s an ADL anyway?

What’s missing/next?

8 of 21

WAC - Representing the Data Model

WAC Data Model Struggles

  • For practicality reasons (ease of development and performance) implementing codecs in WASM was easy if only I could represent the data model serialized
    • Encode(Serialized Data Model Representation) -> (bytes, error)
    • Decode(bytes) -> (Serialized Data Model Representation, error)
  • Which brings us to WAC, the WebAssembly Codec (I know not creative or very descriptive)
    • WAC aims to be a concrete representation of the IPLD Data Model so that the encoding and decoding functions above work as expected

9 of 21

WAC - What is it?

WAC Data Model Struggles

  • The format is generally speaking `<data-type><data>`
    • True
    • False
    • Null
    • Integer >=0 -> <uvarint>
    • NInteger <0 -> <uvarint>
    • Bytes -> <length><data>
    • String -> <length><data>
    • Link -> <CID>
    • Map -> <num-elements><key><value><key><value>... : keys must be of type string
    • List -> <num-elements><elem><elem>....
    • Float -> Not currently defined (maybe IEEE….)

10 of 21

WAC and Struggles with the Data Model

WAC Data Model Struggles

  • If you read through the spec PR, the issues of contention are not new 🙃
    • Do Strings have restrictions beyond just being bytes, and if so what?
      • As defined by the data model no, but its been a long point of contention
    • Do Map keys have have any restrictions
      • As defined by the data model yes (must be Strings), but its been a long point of contention
    • What’s a Float?
      • Basically unspecified with big warning labels on the website

11 of 21

WAC

WAC Data Model Struggles

  • It works and is pretty easy to implement in any language
  • The most complicated part of the spec is uvarint, however:
    • Uvarints up to 2^63 are required by some of the multiformats specs (e.g. CIDs) anyhow so extension beyond shouldn’t be too bad
  • Floats will eventually add some complexity, but perhaps not too bad given what floats are

12 of 21

Agenda

wasm-ipld: IPLD

Building blocks

Codecs and the IPLD Data Model

WAC Data Model struggles

What’s an ADL anyway?

What’s missing/next?

13 of 21

ADL Basics

What’s an ADL anyway?

  • Codecs look like: Decode(bytes) -> (Data Model Representation, error)
  • ADLs look like: Interpret(Data Model Representation) -> (Data Model Representation, error)
    • Reminder that because the Data Model contains links both of these can be multiblock data structures if need be
  • Bytes are part of the Data Model too, so ADLs are just a generalization of codecs
  • Important practical distinction from codecs to ADLs -> ADLs can reference a lot of data
    • If you have a block limit anyhow codecs can only reference so much data
    • ADLs can try and map large multiblock data -> single “block”
    • Means any sane interface for working with ADLs needs partial operation
      • No cheating and serializing the whole Data Model object again

14 of 21

What’s an ADL?

What’s an ADL anyway?

  • Is there a specification for ADL interfaces we can describe like for the Data Model?
    • Not currently specified
    • Specifying interfaces helps with interoperability with different tooling built on top of ADLs that is specified
      • e.g. some subgraph descriptor like paths, selectors, etc.
      • Seems like this can be ipld-implementation specific until a standardization need comes up
  • wasm-ipld
    • V0: Mostly mimic go-ipld-prime interfaces

15 of 21

Wasm-ipld and ADLs

What’s an ADL anyway?

  • V0: Mostly mimic go-ipld-prime interfaces (demo the thing and minimal thinking)
    • Issues encountered in initial usage:
      • No parallel map access, only iterators (from go-ipld-prime)
        • Important for enumerating entries in sharded directories
    • Others present (from go-ipld-prime)
      • Only bytes have access to subranges
      • Lists have parallel access via indices but no way to optimize loading subsets at once (e.g. not loading the same blocks multiple times)

16 of 21

BitTorrent in IPLD

Case Study - BitTorrent

  • BitTorrent metadata is encoded in the Bencode format
  • Files have names (including paths)
    • Hashes of chunks of the files are included in a single metadata block
    • This is the infodict which is hashed resulting in the infohash
  • Goal: Can I load a BitTorrent file the same way as I do UnixFS files with minimal hackiness
    • Should be extensible to BitTorrent-v2, or other similar formats with minimal extra work

17 of 21

Which IPLD pieces to use?

Case Study - BitTorrent

  • Could use raw nodes and interpret with BitTorrent file ADL or directory ADL
    • Like UnixFS we could have a single BitTorrent(v1) ADL rather than splitting into files and directories
  • Using raw nodes, maybe interpret data with a Bencode codec then use the BitTorrent file ADL
  • Ok, but it means ADL signaling (as opposed to the codec signaling built into a CID) is required to traverse links. Should we have a BitTorrent codec and then the ADL?
  • For evaluation purposes used Bencode + a joint file/directory ADL BitTorrent-v1 ADL

18 of 21

Demo

19 of 21

Agenda

wasm-ipld: IPLD

Building blocks

Codecs and the IPLD Data Model

WAC Data Model struggles

What’s an ADL anyway?

What’s missing/next?

20 of 21

Where do we go from here?

What’s missing/next?

  • IPLD pieces of wasm-ipld
    • Add support for parallel map access
    • As requests for additional ADL functionality comes in expose it
    • WAC spec
  • IPLD
    • Codecs and ADLs are nice, but some guidance seems helpful
      • When would I use a codec over an ADL
      • Are codecs and ADLs enough or the right abstractions for some common use cases?

21 of 21

Thank you.