The Decentralized Web
What it is, what’s useful in the archival space,�and current status and challenges
Kelsey Breseman October 2019�Rob Brackett EDGI & Data Together
70%
of Internet traffic is directly controlled by Google and Facebook
2
Great power
Great irresponsibility
3
A question of trust: who should hold your data?
EDGI: Who should hold environmental data that belong to the public?
& how can they be collected, held, and distributed justly?
4
The internet is fragile
...as we’ve learned
5
Data requires continuous stewardship.
Who is able to hold & distribute data?
Who is not?
What data is well-kept, and for whom?
6
How do we steward data?
“Applied to information, stewardship focuses on assuring accuracy, validity, security, management, and preservation of information holdings.”
Dawes. “Stewardship and usefulness: Policy principles for information-based transparency.” Government Information Quarterly, 2010.
Content-based addressing (CIDs, hashes, crypto- signatures, etc.)
LOCKSS: “Lots of Copies Keeps Stuff Safe”�(a principle + eponymous software & org)
7
Lots of copies:
peer-to-peer (P2P)
sharing
8
Lots of Copies Keeps Stuff Safe (LOCKSS)
9
Kind of like
...Ever worried about downloading a virus from one of these?
If the internet is peer-based, we’re going to need a way to trust our content.
10
Provable Validity
“How do I know this is really the NASA data?”
(if it’s not at nasa.gov/has been archived)
Digital fingerprinting Digital signatures
“Is this the right content?”
“Is this content trustworthy?”
12
Content-based addressing:
I’m getting what I asked for
13
The address of the content is built from the content
Everything is algorithmically scrambled into this hash– so if you change even a comma of the content, the address is different
14
Example: IPFS’s “CID” content-based address
Contents of “hello.txt”:
Hello world!
Adding file to IPFS and getting back its CID:
➜ ipfs add hello.txt
> QmXgBq2xJKMqVo8jZdziyudNmnbiwjbpAycy5RbfDBoJRM
Contents of “hello.txt” (the same file, modified):
Hello, world!
Adding file to IPFS and getting back its CID:
➜ ipfs add hello.txt
> QmeeLUVdiSTTKQqhWqsffYDtNvvvcTfJdotkNyi1KDEJtQ
The content’s address is made from the content itself
If you ask by content rather than address, you will always get back the exact thing you asked for
...or nothing, if no one has it online
https:// frijol.gitbooks.io /climate-change/content/
how I’m asking | who I’m asking | where to look
how I’m asking | what I’m asking for (from anyone)– the hash
/ipfs/ Qc98f2c0ee40323148c99285a83c1a80d2179a454dcbc7d3393dc52cc146f47
15
Content addressing means you get what you asked for, even if you don’t know or trust the source.
Challenges:
16
Key-based addressing:
This content is from who it says it’s from
17
A “keypair”: two keys that only work with each other
If it’s encrypted with one of the keys, it can only be decrypted with the other (and vice versa).
If you make one of those keys public but keep the other one private to just you, you can encrypt something using the private key & send it along with the public key– so anyone can decrypt but only you could have encrypted it. (“Digital signature”)
18
The content’s address is its public key
If the content’s signature can be verified with the public key, only someone with the private key could have put it there. (this is done automatically)
https:// frijol.gitbooks.io /climate-change/content/
how I’m asking | who I’m asking | where to look
how I’m asking | what I’m asking for (from anywhere)
dat:// b3c98f2c0ee40323148c99285a83c1a80d2179a454dcbc7d3393dc52cc146f47
19
Key-based addressing means you know who made the data, even if you don’t know or trust who gave it to you.
Challenges:
20
Tying that together
All of these are works in progress.
21
Pitfalls you will still have:
22
The internet is fragile.
The decentralized web�(the new Internet?)�...has different problems
23
What is the DWeb, so far?
Who’s building this new internet?
I think you mean
“new internets”
25
Protocol Labs (IPFS, Filecoin) (QRI), Dat, Secure Scuttlebutt, CKAN….
26
Lots of development in this space!
Challenges:
Justice & equity concerns:
27
How is EDGI involved?
And, how might we be?
Storing datasets on IPFS
Via a Data Together node hosted by QRI
29
Data Together Reading Group
Hosting values-focused discussions�for DWeb creators
06/2018 Decentralized Web
07/2018 Ownership
08/2018 Commons
09/2018 Centralization
09/2018 Privacy
10/2018 Justice
04/2019 Knowledge Commons
05/2019 Civics
06/2019 Alternatives to Capitalism
08/2019 Stewardship
11/2019 Decentralization
30
Fostering connections
Bringing together dWeb developers & data managers from different projects
31
Testing dWeb technologies
Testing archival use cases and giving feedback to the technology creators
32
...and more?
Learn more & get involved
34