1 of 24

Recording high-fidelity online news sites with Webrecorder

Dodging the Memory Hole�November 15, 2017

Ilya Kreymer and Anna Perricci

2 of 24

About Webrecorder

Create high-fidelity, interactive captures of any web pages you browse

http://webrecorder.io

Webrecorder Player App

3 of 24

Webrecorder Project key facts

  • Robust tools
  • Free to use
  • Fully open source
  • Using open standards
  • Growing user community
  • Quickly evolving

A project by

with generous support from

4 of 24

Webrecorder Team

Dragan Espenschied

Rhizome's Digital Conservator

Ilya Kreymer

Lead developer & Creator

Mark Beasley

Senior Front-End Developer

Pat Shiu

Design Lead

Anna Perricci

Partnership Manager & Sustainability Consultant

5 of 24

Why is it hard to preserve online news?

  • News sites push the limits of current technology�
  • Interactive content!�
  • Lots of video/audio and ad content�
  • Traditional crawlers can’t run Javascript�
  • Rapidly changing

6 of 24

High fidelity web archiving

  • Record any web page loaded in the browser

  • Archive interactive content (only available after user input)

  • Same system for recording and playback (web browser)

7 of 24

Webrecorder creates high fidelity web archives including

elements that crawler based systems

often fail to capture

such as interactive content

8 of 24

Collecting at human scale

  • Webrecorder: web archiving for all!
  • Collecting is done by a person via web browser one page at a time�
  • Can import and augment collections created by crawlers

The payoff for careful capture is an �accurate representation of the original

9 of 24

What about social media?

  • Webrecorder can capture content from social media sites, and works especially well with Instagram and Twitter�
  • Some websites deliver content individualized for each user
    • Webrecorder can record the content you see when you are logged in to a social media profile

10 of 24

Account login is optional

  • One does not need to login to use Webrecorder to capture web content
    • Users can download the captures right away (as a WARC file) & save them locally

  • For continued access to archived content online & to be able to add to a collection, one must log in to a free account

11 of 24

Access & sharing options

  • User created collections can be kept private or made public through Webrecorder.io

  • Public collections can be viewed by anyone
    • Finer access controls are being considered

12 of 24

Preconfigured browsers

  • Using a preconfigured browsers to capture and replay web content that may not be supported in future web browsers
    • e.g. Flash
  • Access with a preconfigured browser ensures greater faithfulness to the original look and feel of web pages
  • Browsers use HTTP proxy mode = even better fidelity

13 of 24

Webrecorder Player

  • Desktop application for OSX, Window and Linux�
  • User friendly application to browse any web archive (saved in standard WARC format)�
  • Can browse web archives offline, no internet connection required!

14 of 24

Did you happen to hear about what happened with Gothamist and DNA Info?

15 of 24

2 ways Webrecorder can help

1.�Content on the live web can be captured, saved & accessed (Webrecorder.io). These web archives can also be downloaded for offline access

2. �Archived web content can be imported into Webrecorder (via extraction from other web archives). Webrecorder collections can be saved in Webrecorderio.io or downloaded for offline access

16 of 24

Preparing for further outreach

  • Preparing for upcoming trainings for journalists on
    • capturing their own finished work
    • using web archiving to capture source materials (to ease or alleviate some link rot issues)�
  • Full set of user documentation to be published in 2018 following user interface redesign

17 of 24

Demo!

18 of 24

Webrecorder Player - NEW Release Preview

  • Uses HTTP/S Proxy Mode for better fidelity
  • Support for Flash plugin built-in
  • Cache WARC indexes for faster reloads
  • New Look and Side bar
  • Coming Soon!

19 of 24

Open Source Tools

https://github.com/webrecorder/webrecorderplayer-electron/releases

  • Core Replay Engine: https://github.com/ikreymer/pywb

20 of 24

Webrecorer Sample Collections

21 of 24

Next steps for Webrecorder

  • Wrapping up current major Mellon funding
  • Further technical and organizational development in progress
  • Ethics and Archiving the Web
  • Stay tuned for further developments with Webrecorder

22 of 24

Q & A

23 of 24

Using Webrecorder Today

Use Our Hosted Service

  • Sign-up at https://webrecorder.io/ for a free account

Run your own Webrecorder instance:

Download Webrecorder Player for your Desktop:

24 of 24

A project by

with generous support from

Thank you

additional outreach support