1 of 16

Web Shared Libraries

Rob Flack (flackr@google.com)

2 of 16

Challenges of loading libraries

3 of 16

Can’t we do this with caching?

a.com/

b.com/

cdn.com/lib.js

GET

GET

Cache:

a.com/

cdn.com/lib.js

GET

b.com/

cdn.com/lib.js

✔️

4 of 16

Makes your server depend on external server

ERROR!

a.com/

b.com/

cdn.com/lib.js

GET

GET

Cache:

a.com/

cdn.com/lib.js

GET

b.com/

cdn.com/lib.js

GET

5 of 16

Double keyed caching disables this sharing

a.com/

b.com/

cdn.com/lib.js

GET

GET

A.com cache

B.com cache

a.com/

cdn.com/lib.js

GET

cdn.com/lib.js

b.com/

GET

6 of 16

Web Shared Libraries proposal

Use integrity checksum on resources as the cache key.

Use shared-channel URL to look up:

  • Channel public key
  • Canonical current release URL

7 of 16

Web Shared Libraries proposal

Example 1: integrity

a.com/:

<script src=”lib.js” integrity=”a23c2c…”>

a.com/lib.js:

function doCoolStuff() {

...

b.com/:

<script src=”cdn.com/lib.js” integrity=”a23c2c…”>

GET

GET

Cache:

a.com/

a.com/lib.js (a23c2c…)

GET

b.com/

✔️

cdn.com/lib.js (a23c2c…)

cdn.com/lib.js:

function doCoolStuff() {

...

8 of 16

Web Shared Libraries proposal

Example 2: shared-channel

a.com/:

<script src=”lib.js” shared-channel=”lib.com”>

a.com/lib.js:

function doCoolStuff() {

...

b.com/:

<script src=”cdn.com/lib.js” shared-channel=”lib.com”>

lib.com/public_key:

<public-key>

a.com/lib.js*

Cache:

a.com/

✔️

a23c2c…

verify

❌lib.com,

GET lib.js

a23c2c…

b.com/

cdn.com/lib.js*

✔️

✔️lib.com,

HEAD lib.js

cdn.com/lib.js:

function doCoolStuff() {

...

9 of 16

Potential impact

If the top 100 libraries (based on crawl of top 10k sites) were shared in cache:

  • 1.9 MB of compressed cached libraries,
  • saves about 4.6% of bandwidth or
  • 1.2 GB saved across top 10k sites*

Top 500?

  • 10.7 MB of compressed cached, 7.6% of bandwidth, 1.9 GB saved on top 10k

* Assuming double keyed caching and you visited all 10k sites

10 of 16

Long-term potential

With the incentive to share common packages,

  • Common versions of common libraries would usually be in cache
  • Promotes a common set of core libraries
  • Quickly updatable
  • Sites/bundlers incentivized to unbundle common components of large bundles

11 of 16

Privacy concern - learning about other sites

Hi lib.js user!

a.com/:

<script src=”lib.js” integrity=”a23c2c…” onload=”fn>

function fn() {

// User visited a site that uses lib.js

}

a.com/lib.js:

<404>

GET

Cache:

a.com/

✔️

a.com/lib.js (a23c2c…)

12 of 16

Privacy concern - cooperative tracking

a.com/:

for (let i = 0; i < 32; i++) {

if (userId & (1 << i))

load(‘/bit’ + i + ‘.js’);

}

function load(lib) {

// looks up expected checksum and

// loads the library with it.

...

}

b.com/:

// Fetching /bit* will fail if not cached.

var userId = 0;

var bits = 0;

for (let i = 0; i < 32; i++) {

load(‘/bit’ + i + ‘.js’).then(() => {

setbit(i, 1);

}).catch(() => {

setbit(i, 0);

});

}

function setbit(i, value) {

userId |= (value << i);

if (++bits == 32)

setUser(userId);

}

13 of 16

Always fetch on first load from new domain

  • If it is your first time accessing lib.js from the domain.
    • Don’t allow cache hits on shared library. Refetch from the specified source.
  • Pros
    • Subsequent hits are indistinguishable from cache.
  • Cons
    • If we don’t evict at the same time as regular cache this can be detected
      • <script src=”shared-lib.js” …>
      • <script src=”private.js” …>
      • If second hits network but first does not, shared-lib was likely cached from another site.
    • Preserving privacy perfectly makes this virtually identical to cache

14 of 16

Always prefetch known list

  • Always prefetch a popular list of resources (e.g. popular in top 10k sites)

  • Pros
    • Presence of library doesn’t say anything about user’s navigation history
    • List wouldn’t change frequently
    • Could lazily prefetch
  • Cons
    • Fetching resources you may never use
      • Though they rarely change and you don’t have to download them right away
    • Wasted space means caching fewer resources you actually use

15 of 16

Only share very commonly used libraries

  • Only share popular cached libraries (e.g. popular in top 10k sites)
  • Optionally create false negatives (pretend not cached) or false positives (prefetch a few random common libs)
  • Pros
    • Gets most of the benefit (i.e. less popular libs are less likely to actually be reused)
    • Cache hit does not reveal an exact site
  • Cons
    • Still reveals class of users (e.g. gamers)
    • May be gameable
    • Cache miss for extremely common resources may become a signal

16 of 16

Summary

  • Web shared libraries shares cached libraries between resources
  • Shared-channel may be unnecessary
  • An incomplete list of possible privacy mitigations:
    • Always fetch library on first visit to new domain
    • Prefetch list of well known popular libraries
    • Only share well known popular libraries across domains
  • Will this provide incentives to unbundle and land on common versions?