1 of 63

Decentralized identity & content addressing at web-scale with a non-financial app-specific blockchain

Headjack - the base layer of cyberspace

By Viktor Kirilov

https://xkcd.com/927/

2 of 63

About me

Blockchain !!!

https://twitter.com/lexfridman/status/1579579284204974081

Economics? Peace?

Both!

Crypto & β€œweb3” are not about monkey JPEGs & ponzis. Please suspend any preconceptions & keep an open mind for an hour.

3 of 63

Presentation outline

  • Problems with the web (non exhaustive)
  • Prerequisite knowledge
  • Headjack - the solution
  • Why care - what would be possible

4 of 63

Problems with the web

  • The host-centric web
  • Centralization
  • Ads & surveillance capitalism
  • Algorithmic black boxes
  • Social media & identity
  • Vertical integration & data silos

This list is non-exhaustive

5 of 63

The host-centric web

  • IP, DNS & unicast
    • URL link rot & content drift => digital memory hole
    • host-certified documents

  • "It is generally recognized that the current approach of using IP address both as a locator and as an identifier was a poor design choice." - David D. Clark, Designing an Internet

  • "More than 98% of the information on the web is lost within 20 years" - a16z Podcast

  • "Society can’t understand itself if it can’t be honest with itself, and it can’t be honest with itself if it can only live in the present moment." - The Internet Is Rotting - Jonathan Zittrain

  • "People tend to overlook the decay of the modern web, when in fact these numbers are extraordinaryβ€”they represent a comprehensive breakdown in the chain of custody for facts." - The Internet Is Rotting - Jonathan Zittrain

6 of 63

Centralization

  • DNS
  • Certificate Authorities for TLS/HTTPS
  • Google & search
  • Browsers
  • Cloud providers
  • Data centers
  • Routing

7 of 63

Ads & surveillance capitalism

8 of 63

Algorithmic black boxes

  • Recommendation algorithms are the architecture of virality - the dynamics of amplification & interaction dictate how ideas surface, propagate & evolve.
    • The people writing the algorithmic feeds are the most powerful in the world - @naval.
    • The Social Dilemma (2020)

9 of 63

Social media & identity

  • Centralized & fragmented identity
    • "as of 2018 the consolidation of power and control over the social web by a few large corporations seems unparalleled" - Decentralizing the Social Web
  • User lock-in & lack of choice
    • if you exit you lose your connections & reputation - network effects are too strong
  • No accountability & transparency in moderation & algorithms
  • Echo chambers & filter bubbles

10 of 63

Vertical integration & data silos

  • No interoperability between platforms
    • stifled innovation
  • Reimplement the same functionality in-house
    • Effort is fractured => suboptimal solutions
    • Compete for talent => company bloat
  • Ecosystems built on platform APIs can be killed
  • High barrier to entry & the cold start problem

11 of 63

Prerequisite knowledge

  • Public-key cryptography
  • Hash functions
  • Merkle tree
  • Merkle proof - proof of inclusion
  • Blockchain: hash-linked chain of blocks
  • Blockchain vs state
  • Content addressing & IPFS

12 of 63

Public-key cryptography

  • Pair of 2 keys
    • Public & private
    • Bound by math
  • Can:
    • Encrypt with public
      • Decrypt with private
    • Sign with private
      • Verify with public

13 of 63

Hash functions

  • hash_func(input_data) => hash (fixed-size string)
  • Usually size(input_data) > size(hash)
  • Irreversible - can't find the actual data from hash
  • Small change in data => a different hash
  • Identical data => same hash (deterministic func)

Example: SHA3-256("hello") = "3338be694f50c5f338814986cdf0686453a888b84f424d792af4b9202398f392"

14 of 63

Merkle tree

15 of 63

Merkle proof - proof of inclusion

https://medium.com/blockchain-stories/the-tale-of-merkle-tree-in-bitcoin-blockchain-2c5fa5a298f7

16 of 63

Blockchain: hash-linked chain of blocks

https://mlsdev.com/blog/156-how-to-build-your-own-blockchain-architecture

17 of 63

Blockchain vs state

(no, not the "monopoly on violence" kind of state)

  • Blockchain: list of blocks with transactions (changes or incremental patches - diffs)
    • Block space is limited (due to consensus)
  • State: current state of the ledger (who has what) materialized by applying the diffs in the blocks
    • stateTransitionFunction(oldState, diff) = newState
    • Basically a shared database / state machine
    • Can be represented with a tree (Ethereum) which can be Merkleized & the root state hash can be embedded in the blocks

18 of 63

Content addressing & IPFS

  • hash_function(file) => content_hash
  • IPFS: InterPlanetary File System
    • Content-addressable p2p network
    • Request file by content_hash
    • No limits for number of files or their size
      • But latency can be an issue
    • No incentives for nodes to store & retrieve data
      • Just like with torrents

19 of 63

Problem statement

We have reached a local maximum.

How do we use β€œweb3” to improve social media & the web?

We start with identity.

20 of 63

Identity is the foundation

  • Fundamentally just an identifier - nothing more
  • UNIX philosophy - do one thing well
    • Modularity, composability, layering
    • No NFTs, creator coins & financialization
  • Everything else (profiles, verification, KYC, privacy, reputation, data storage, etc.) can be layered on top
  • Let’s use a blockchain for property rights for identity

21 of 63

Enter Headjack

Combining existing building blocks in a novel way

with a different set of tradeoffs

22 of 63

Design goal 1: Web-scale

  • Billions of users
  • Unlimited data

23 of 63

Design goal 2: Web2-like UX

  • No key pairs & paying for transactions by default
    • "With consumer products, simple and β€œwrong” beats complicated and β€œright.”" - @naval
  • No self hosting by default
    • "People don’t want to run their own servers, and never will." - Moxie Marlinspike
  • Human-readable URIs instead of full of hashes

24 of 63

Design goal 3: Decentralization

  • Sovereignty: can own your identity with a key pair (even if using a β€œcustodial service” by default)
  • Credible neutrality - anyone can permissionlessly have an account, operate a service & broadcast through the network (speech != reach)
  • Can self-host & run software locally, although at a disadvantage (can’t aggregate activity at web-scale)

"You can build something centralized on something decentralized but you can’t build something decentralized on top of something centralized. Decentralization is always the base layer." - @RyanSAdams

25 of 63

Design goal 4: simplicity

  • Should be clear how much it can scale
  • Should be clear what the guarantees are
    • What has data availability
    • Economics
    • Time to finality for important operations
  • A global singleton is the easiest to work with
    • The simplest mental model will win vs a fractured landscape of standards, chains & bridges
    • "Simplicity is the ultimate sophistication." - Leonardo da Vinci

26 of 63

On-chain vs off-chain

On-chain (the absolute bare minimum):

  • Account IDs (unique auto increment int64)
  • Public key ownership (optional - keys not necessary)
  • Name ownership (optional - names not necessary)
  • (some) authorization actions (most are off-chain)
  • Merkle roots (anchors / commits) for off-chain data

Off-chain (anchored in batches):

  • All other events - documents, actions, content

27 of 63

3 β€œtypes” of accounts

  • Users
  • Identity Managers (IDM)
    • Handle authentication & authorization with apps
      • Think β€œlogin with Google” - Single Sign-On (SSO)
    • Handle direct messages (DMs)
    • (optional) Archive user activity from applications
    • β€œcustodial” service by default, but users can self-host
  • Applications - for browsing & publishing media

All 3 have on-chain IDs and one entity can play all 3 roles

28 of 63

AuthN, AuthZ & blockchain usage

  • Users view media through applications
    • AutheNticate off-chain using IDMs or keys
  • Users post content through applications
    • AuthoriZe the app to post on their behalf (no sig)
      • On-chain with an IDM
        • Events are batched & contain only 1 signature
      • Off-chain + access token (like OAuth/OpenID)
    • Or with explicit signatures using their keys (rare)
  • Apps aggregate content & submit in bulk/batches
    • Content is off-chain, Merkle root is on-chain
  • The blockchain is hidden in the background for users
    • IDMs & apps will cover the costs

29 of 63

Off-chain messages & evolution

  • Message / event / action / data / document / content are the same thing
  • Taxonomy of message types
    • Hierarchical with inheritance - base types: text/image/video/collection
    • Anyone can extend types by creating new ones
    • Fallback (default) presentation (rendering) if a type is not supported
      • Will be true initially for all new types - before they become common
  • Prediction message (text): β€œ{type: "42", asset: "$ETH", date: "2025.02.12", above_or_below: "above", price: "10000$", probability: "80%"}”
  • Fallback rendering template for type 42: "{asset} has {probability} chance of being {above_or_below} {price} by {date}"
  • Result: "$ETH has 80% chance of being above 10000$ by 2025.02.12"

30 of 63

Everything is a message

  • Social graph connections, account preferences, content
    • Attestations, verifiable credentials, authorization tokens
  • Privacy & closed groups can work too!
    • Messages can be encrypted
    • Message blobs can be anchored but accessible for all
      • Meaning not everyone will be able to download them
      • This way even intranet (internal) documents can use the same global identity & content addressing layer
  • We can rebuild any primitive & service from Web2

31 of 63

Content creation, addressing & URIs

https://culturexchange1.wordpress.com/2015/06/02/the-telephone-switchboard-the-story-of-a-revolutionary-instrument/

32 of 63

Content blob structure & nonce

Applications:

  • Collect messages
  • Group them by user
    • Index in groups
  • Construct a big blob with all event groups
    • Also an offset table
  • Construct a Merkle tree with all events
  • Post the Merkle root & IPFS hash of the blob

33 of 63

What is a nonce

  • In our case: on-chain number tied to every account
  • When an account (application) publishes a batch of off-chain data their nonce is incremented (starting from 0)
    • Also save the nonce/block_height (number) mapping
  • All messages of a future blob use the next application nonce when being constructed
  • Used for translating to a block_height for the commitment
    • (App_ID / nonce) => block_height

34 of 63

URIs: addressing specific events

35 of 63

URIs & content references

36 of 63

The global virtual address space

  • Accounts can signal on-chain ways for contact - RPC/REST endpoints, etc.
  • Many ways to retrieve data for a URI:
    • Fetch the entire blob if still available
    • OR ask the application (part of the URI)
    • OR ask the user (or their IDM)
    • OR ask some archive service
    • OR use a URI-focused p2p network
  • Use a Merkle proof to check authenticity
  • Infinite virtual address space (4 integers)

37 of 63

State: Names/keys/nonces/authZ

38 of 63

Stable & human-readable URIs

  • Names also have a nonce - incremented when owner commits off-chain content
    • The state maps the nonce of a name to account_id/nonce pair
  • URI: twitter.com/5542/johnny/3
    • Translate twitter.com/5542 to 2131/6235
    • Translate 2131/6235 to Block 8899
    • johnny’s owner at block 8899 is 77534
    • β‡’ canonical form: 2131/6235/77534/3
  • Even if twitter.com and/or johnny change ownership URI can still be properly translated
    • ENS / Namecoin can’t do this

39 of 63

Communication & authorization

Users can:

  • Have a key & sign actions explicitly - self-sovereign
  • Have a key & authorize IDMs/apps for better UX while retaining ultimate control
  • NOT have a key & trust an IDM (β€œowned” by it)
    • Can β€œgraduate” by binding a key at any point in time
  • Not think about blockchain transaction costs & wallets

40 of 63

Chain used mostly by IDMs/Apps

41 of 63

Trustless

Better trust

Possible because:

  • Creation != finance
    • β‡’ Different tradeoffs
  • Off-chain data with no storage & retrievability guarantees
    • On a best effort basis
  • β€œCustodial” IDM services are OK because there’s nothing to steal & users with keys can override & revoke access

42 of 63

Self-authenticating documents

  • The authenticity of a document with a URI can be checked by:
    • The Merkle proof for being included in a block
    • If it was indeed created by the user
      • Either the data is explicitly signed
        • We need a Merkle proof from the blockchain state that shows the key ownership at that time
      • Or the app was authorized to post on behalf of the user
        • We need a Merkle proof from the blockchain state that shows the authorizations of the user at that time
        • Or there’s an embedded auth token in the message
  • Documents can be cached authentically with a few proofs even if they stop being hosted (user/app/archive/IPFS/p2p network)

43 of 63

Ordering integers throughout time

44 of 63

Throughput numbers (very rough)

  • Ethereum: 6 Kb/s of blockchain growth β‡’ 13 TPS
  • Headjack: 100 Kb/s
    • Eth ZK validium (can inherit a lot of Eth security)
    • 1000 applications anchoring once every 10 sec
    • 6000 authorizations per second (through IDMs)
      • 0.6 auth per person per day for 1 billion people
      • Unlimited off-chain authZ
    • This is the naive baseline with 0 optimizations
      • DA for validiums can go even beyond 1 MB/s

45 of 63

The timestamp machine

  • We’ve constructed a cryptographic sequencer of events
    • All authZ events & key/name changes are ordered
  • On & off chain data is batched for efficiency
    • Can handle unlimited amounts of off-chain data
    • Has enough on-chain capacity for billions of identities
  • URIs are permanent - even those with names
    • Names are most useful when in URIs
    • We can address within documents because URIs are stable
  • It can displace DNS & certificate authorities
    • A confluence of multiple inter-related things
    • Has the strongest synergy & network effects

46 of 63

Linking data to identity at scale

47 of 63

🐦Twitter vs Reddit post?

  • Comments in a tree structure
    • Reddit expands the entire tree by default
    • Twitter shows only one level of comments
    • A Reddit post is always a part of a subreddit
  • Twitter: account-focused, Reddit: community-focused
  • Why not merge their state? They are mostly the same!
    • Reputation of users can be one (shared)
    • We can have different UIs for the same data
    • Can show Reddit-like activity in a Twitter-like feed
      • Same goes for YouTube comments!
  • "When identities become portable, backends become liquid" - @balajis

48 of 63

Same data - different views

https://world3d.com/2020/06/the-history-of-lenticular/

"Data is the center of the universe; applications are ephemeral." - The Data-Centric Manifesto

49 of 63

Applications & infrastructure

AWS-like Infrastructure:

  • Ingest & store all data
  • Filter it by some criteria
  • Create algorithms & indexes

Applications:

  • Choose infra provider
    • Can migrate to another
  • Choose indexes & filtration
  • AND/OR let users choose indexes, algorithms & filters
  • Pay per API call for indexes
  • Serve Ads (handled by infra)
    • OR charge users

50 of 63

Interoperability - no more silos!

  • "Composability is to software as compounding interest is to finance" - @cdixon
  • Anything can become an event stream
    • References to an account / word / URI
    • Advanced filtering (thresholds, exclusion)
    • Transforming & joining streams
  • Anyone can create a new algorithm & index
    • Can pay an infra company and charge apps!
  • Any type of advanced query would be possible

51 of 63

Competition & specialization

  • More competition (& transparency!)
    • Lower barrier to entry for startups
    • In search engines & infrastructure
    • In recommendation algorithms
  • More choice - ability to exit a service
    • Network effect is shared
  • Composable media - build only the UI & pick off-the-shelf algorithms & indexes
  • Redundancy & resilience
    • Easier horizontal scaling
    • Better archiving (not just snapshots)
    • Topological flexibility - storage & location can change but data addresses stay the same

52 of 63

Unbundling the media stack

"There’s only two ways I know of to make money– bundling, and unbundling." - Jim Barksdale

"The whole is greater than the sum of its parts." - Aristotle

A single "one-size-fits-all" company can never be what an open ecosystem could be

53 of 63

History 2.0: the ledger of record

  • Host-centric (URL) vs data-centric (permanent URI)
    • No more link rot (broken links) or content drift
    • Self-authenticating documents
  • The global GIT
    • Can deduplicate documents & add traceability
    • All changes to a document will be tracked
    • Imagine being able to see the history of edits on any document (like for Wikipedia pages)
  • We can build reputation systems & a web of trust
  • Argument from cryptography > instead from authority

54 of 63

Other possibilities

  • Being able to see all the public activity of anyone no matter from which application and what type
  • Multi-dimensional subscriptions - include/exclude based on user, application, content type, tag, keyword
  • Universal bookmarks & playlists - across all apps
    • β‡’ knowledge management tools on top of them
  • Tunable notification settings in our IDMs
    • "Notifications are just alarm clocks that someone else is setting for you." - @naval
  • Intra-document addressing (specific range)
  • Forking communities, moderation filters & UIs

55 of 63

Things we didn’t cover

  • Taxonomy & extensibility of message (event) types
  • Advanced authorization (off-chain sessions/tokens)
  • Storage & retrievability of off-chain data
  • Optimizations & pre-commit access
  • ZK & blockchain implementation
  • State growth & proof sizes
  • Editing & deleting content
  • Tokenomics & revenue
  • Names in-depth
  • Content moderation
  • Changes to business models
  • Regulation, GDPR & the right to be forgotten
  • The decoupling of identity-related services from apps
  • …a lot more actually, also the negatives

56 of 63

β€œThe medium is the message”

Marshall McLuhan (media theorist) proposed that a communication medium itself, not the messages it carries, should be the primary focus of study.

57 of 63

AGI: Artificial Global Intelligence

  • "Every follow is a synapse in a global metabrain" - Danielle Fong

  • "Because it consists of billions of bidirectional interactions per day, Twitter can be thought of as a collective, cybernetic super-intelligence" - @elonmusk

  • "The complexity of a sensemaking system must match the complexity of the environment." - source

58 of 63

The host-centric model must go

  • We're in an asymmetric relationship with today's platforms - they've siloed our data and we're their captives.

  • We've been given a single one-size-fits-all algorithmic feed for our information diet that's solely optimizing for engagement & time WELL spent with their ad revenue as the most important metric.

  • We should be offended by the lack of choice & transparency

  • We should be able to surface any kind of signal.

59 of 63

The Metaverse

  • "The metaverse isn’t a 3D world owned by some corporation. It’s a permissionless market-network which respects and interconnects all user-owned and cryptographically-secured digital identities, reputations, wallets, communities, spaces, and objects." - @naval

  • "We think of the metaverse as the entirety of all composable and interoperable resources, identities, applications, platforms, services, and protocols that exist in cyberspace." - source

  • "The β€œmetaverse” as I like to envision it, is a globally shared and permanent digital reality not owned by any single entity that any company, platform, or person can plug into, regardless of where they are or what device they’re using." - source

60 of 63

The Internet

Visualization from the Opte Project of the various routes through a portion of the Internet in 2005

== Graph colors depending on IP range:

Asia Pacific

Europe/Middle East/Central Asia/Africa

North America

Latin American and Caribbean

RFC1918 IP Addresses

Unknown

"The Internet is the largest engineering project the earth has ever seen - and we're just getting started" - Barrett Lyon, founder of OPTE Project

61 of 63

Adding identity in the OSI model

The identity layer needs to be solved only once

Everything else can be layered on top

https://www.imperva.com/learn/application-security/osi-model/

62 of 63

The identity layer needs to be solved only once.

63 of 63

Q&A