/
Quantifying the magnitude of news events (vis)
Presented by Alexander C. Nwala (@acnwala)
LAMP-SYS: Lab For Applied Machine Learning and NLP System
Presenter: Dr. Jian Wu & Muntabir Choudhruy
Text Extraction
Docker Pipeline
OpenCV Face Detection (dnn - deep neural network)
Grampa, what's a deleted tweet?
Presented by: Mohammed Nauman Siddique (@m_nsiddique)
Deleted Tweet
Apology Tweet
Archived Tweet
TweetedAt: Finding Tweet Timestamps for Pre and Post Snowflake Tweet IDs
Presented by: Mohammed Nauman Siddique & Sawood Alam
Estimated timestamp from TweetedAt is off only by 12 minutes
TimeMap Visualization: An Archival Thumbnail Visualization Server
Presented by Dhruv Patel
tmvis.cs.odu.edu
URI: odu.edu
Docker: A Solution to the Magic Laptop Problem
Host OS
Host OS
Docker Engine
Bins/Libs
App B
Bins/Libs
App A
Container
Dockerfile
Image
Build
Run
Container
Web App
API Server
Live Web
Redis Cache
FROM python
LABEL maintainer="Sawood Alam <@ibnesayeed>"
RUN pip install beautifulsoup4 requests
COPY linkextractor.py /app/
WORKDIR /app
RUN chmod a+x linkextractor.py
ENTRYPOINT ["./linkextractor.py"]
MementoMap
A Web Archive Profiling Framework
for Efficient Memento Routing
$ memgator -a archives.json -f cdxj cs.odu.edu \
> | grep -v "^!" | cut -d '/' -f 3 | sort | uniq -c | sort -nr
973 web.archive.org
17 arquivo.pt
8 wayback.archive-it.org
1 archive.md
InterPlanetary Wayback
& Reconstructive
By: Sawood Alam <@ibnesayeed> and Mat Kelly <@machawk1>
Who Is Accessing Web Archives?
What is an User Agent?
How Can We Identify A Robot?
I’m a Human!!!
What are Web Archives?
What are Robots?
Are Robots Good or Bad?
What are Access Logs?
https://www.cs.odu.edu/~mln/pubs/jcdl-2013/fp105-AlNoamany.pdf
https://www.etsy.com/listing/75400005/zombie-shirt-zombie-robot-seeking-cpus-t
by
By: Kritika Garg, Himarsha Jayanetti
• 88 Images
• 41 JavaScript files
• 6 CSS files
• The base HTML file
C57d2b5a97b080fc099fae31bc356550729c8ced137820d9cd19c18676ecca8a
Reload 1
You cannot replay twice the same archived page
3452aaf2ff36fb73e9d7e08d0bd4442027a0aa068fb7813ddb4587a716f9a0de
Reload 2
It is hard to compute fixity on archived web pages
An aggregated
cryptographic
hash value
• One in eight archived pages (11.55%) always produce the same hash
• One in six archived pages (16.06%) produce a different hash on each replay
Archived pages produce different hashes because:
- Transient-error (e.g., HTTP 500)
- Dynamically loaded resources (i.e., via JavaScript)
- Changes in HTTP entity headers and bodies
- 16,627 archived pages (mementos) from 17 public web archives
- Each memento was downloaded 39 times within our 14-months study
Six WS-DL courses are offered for Fall 2019:
Five WS-DL courses for Spring 2020:
For updates and more info, follow us on Twitter: @WebSciDL