1 of 13

/

2 of 13

Quantifying the magnitude of news events (vis)

Presented by Alexander C. Nwala (@acnwala)

3 of 13

LAMP-SYS: Lab For Applied Machine Learning and NLP System

Presenter: Dr. Jian Wu & Muntabir Choudhruy

Text Extraction

Docker Pipeline

OpenCV Face Detection (dnn - deep neural network)

4 of 13

Grampa, what's a deleted tweet?

Presented by: Mohammed Nauman Siddique (@m_nsiddique)

Deleted Tweet

Apology Tweet

Archived Tweet

5 of 13

TweetedAt: Finding Tweet Timestamps for Pre and Post Snowflake Tweet IDs

Presented by: Mohammed Nauman Siddique & Sawood Alam

Estimated timestamp from TweetedAt is off only by 12 minutes

6 of 13

TimeMap Visualization: An Archival Thumbnail Visualization Server

Presented by Dhruv Patel

tmvis.cs.odu.edu

URI: odu.edu

7 of 13

Docker: A Solution to the Magic Laptop Problem

Host OS

Host OS

Docker Engine

Bins/Libs

App B

Bins/Libs

App A

Container

Dockerfile

Image

Build

Run

Container

Web App

API Server

Live Web

Redis Cache

FROM python

LABEL maintainer="Sawood Alam <@ibnesayeed>"

RUN pip install beautifulsoup4 requests

COPY linkextractor.py /app/

WORKDIR /app

RUN chmod a+x linkextractor.py

ENTRYPOINT ["./linkextractor.py"]

8 of 13

MementoMap

A Web Archive Profiling Framework

for Efficient Memento Routing

$ memgator -a archives.json -f cdxj cs.odu.edu \

> | grep -v "^!" | cut -d '/' -f 3 | sort | uniq -c | sort -nr

973 web.archive.org

17 arquivo.pt

8 wayback.archive-it.org

1 archive.md

9 of 13

InterPlanetary Wayback

& Reconstructive

  • Alam et al., “InterPlanetary Wayback: The Permanent Web Archive”, JCDL 2016
  • Kelly et al., “InterPlanetary Wayback: Peer-To-Peer Permanence of Web Archives”, TPDL 2016
  • https://github.com/oduwsdl/ipwb
  • Alam et al., “Client-side Reconstruction of Composite Mementos Using ServiceWorker”, JCDL 2017
  • Alam et al., “Unobtrusive and Extensible Archival Replay Banners Using Custom Elements”, JCDL 2018
  • https://oduwsdl.github.io/Reconstructive

10 of 13

Who Is Accessing Web Archives?

What is an User Agent?

How Can We Identify A Robot?

I’m a Human!!!

What are Web Archives?

What are Robots?

Are Robots Good or Bad?

What are Access Logs?

by

By: Kritika Garg, Himarsha Jayanetti

11 of 13

88 Images

41 JavaScript files

6 CSS files

The base HTML file

C57d2b5a97b080fc099fae31bc356550729c8ced137820d9cd19c18676ecca8a

Reload 1

You cannot replay twice the same archived page

3452aaf2ff36fb73e9d7e08d0bd4442027a0aa068fb7813ddb4587a716f9a0de

Reload 2

It is hard to compute fixity on archived web pages

An aggregated

cryptographic

hash value

One in eight archived pages (11.55%) always produce the same hash

One in six archived pages (16.06%) produce a different hash on each replay

Archived pages produce different hashes because:

- Transient-error (e.g., HTTP 500)

- Dynamically loaded resources (i.e., via JavaScript)

- Changes in HTTP entity headers and bodies

- 16,627 archived pages (mementos) from 17 public web archives

- Each memento was downloaded 39 times within our 14-months study

12 of 13

Six WS-DL courses are offered for Fall 2019:

13 of 13

Five WS-DL courses for Spring 2020:

  • CS 395 Research Methods in Data and Web Science, Dr. Michael Nelson
  • CS 432/532, Intro to Web Science, Dr. Michele C. Weigle
  • CS 480/580, Intro to Artificial Intelligence, Dr. Vikas Ashok
  • CS 495/595 Intro to Data Mining, Dr. Sampath Jayarathna
  • CS 800 Research Methods, Dr. Michele C. Weigle

For updates and more info, follow us on Twitter: @WebSciDL