Client-side Reconstruction of Composite Mementos Using ServiceWorker
Sawood Alam, Mat Kelly, Michele C. Weigle, and Michael L. Nelson
Web Science and Digital Libraries Research Group
Old Dominion University, Norfolk, VA, 23529
@ibnesayeed
@WebSciDL
Supported in part by NSF III 1526700
JCDL 2017, June 19-23, 2017, Toronto, Ontario, Canada
1
2008 Memento Seen in 2017
2
?
Sawood Alam <@ibnesayeed>
2008 Memento Seen in 2012
3
Sawood Alam <@ibnesayeed>
XenLand @ Alpha Centauri
4
Sawood Alam <@ibnesayeed>
Zombies in Archive
5
?
Sawood Alam <@ibnesayeed>
Zombies in Archive
6
<img src="http://xenland.alpha/images/map.png">
// Is rewritten on replay to become:
<img src="http://archive.example.org/1998/http://xenland.alpha/images/map.png">
// URLs constructed by JavaScript are harder to rewrite on replay, e.g.:
var base = 'http://xenland.alpha';
var imgdir = '/images/';
var img = document.createElement('img');
img.src = base + imgdir + 'ruler.png';
document.getElementById('ruler').appendChild(img);
//=>> http://xenland.alpha/images/ruler.png
Sawood Alam <@ibnesayeed>
Replay URL Resolution & Rewriting
7
Reference type | Example | Resolution after relocation |
Relative path | images/logo.png | Potentially correct |
Absolute path | /public/images/logo.png | Potentially incorrect |
Absolute URL | http://example.com/public/images/logo.png | Potentially live leakage |
http://example.com/public/index.html
...
<img src="/public/images/logo.png">
...
http://archive.example.org/<datetime>/http://example.com/public/index.html
...
<img src="/<datetime>/http://example.com/public/images/logo.png">
...
Sawood Alam <@ibnesayeed>
Avoiding Zombies
8
Sawood Alam <@ibnesayeed>
ServiceWorker
9
Sawood Alam <@ibnesayeed>
reconstructive.js
10
Sawood Alam <@ibnesayeed>
Zombies, No More!
11
Sawood Alam <@ibnesayeed>
Rewriting Mementos is Expensive
In our experiment over 500 home pages we observed:
12
Original capture (without any rewriting)
15% more data in twice the time
Sawood Alam <@ibnesayeed>
Archival Capture Replay Test Suite (ACRTS)
13
reconstructive.js
Sawood Alam <@ibnesayeed>
Reconstruction Winners: PyWB & reconstructive.js
14
Sawood Alam <@ibnesayeed>
Future Work
15
Sawood Alam <@ibnesayeed>
Conclusions
16
Sawood Alam <@ibnesayeed>