A Summary of Contributions to the ETD Project
Lamia Salsabil
PhD Student
Advisor: Dr. Jian Wu
Department of Computer Science
Old Dominion University, Norfolk, Virginia
November 11, 2022
@liya_lamia @WebSciDL
Presented By:
2
Research Study 1: ETD Segmentation
Page-Level Segmentation
3
Page Labeling
4
Figure 1: ETD Pages - 14 Classes
ETD Text & Data Extraction
5
Extracted Text
Bounding Box
Figure 2: ETD Text & Data Extraction Pipeline
6
Research Study 2: Computational Reproducibility study using URLs linking to Open Access Datasets/Software
Past Work
7
OADS: Open Access Datasets and Software
OADS-URLs: URLs linking to OADS
Recent Work
8
Dataset and Challenges
9
Figure 3: Sentences with URLs and their classes
10
Research Study 3: Text Quality Comparison
Text Extraction from Born-Digital ETDs
11
Citation String Based Comparison
12