1 of 14

Accessing and Storing Federal Datasets and Documents

Arden Handler, DrPH

Professor Emerita

Community Health Sciences, UIC-School of Public Health

ardenhandler@gmail.com

2 of 14

���The Preserving Public Health Data CollectiveDatathon �January 31, 2025

  • Advocate, Curate, and Preserve-ate.

3 of 14

Wayback Machine

  • Search for URLs for reports and data you use. Did you find a URL that was taken down? Time to turn to the Wayback Machine, a digital archive of the World Wide Web that was created by the Internet Archive. The Wayback Machine allows users to view snapshots of websites as they existed at various points in time, effectively serving as a historical record of the web. Navigate to: https://web.archive.org/.
  • Step 1. Identify the URL that is broken. For example, a Google search for "NCI diversity" yields a result dated 11/4/24 about a Diversity Career Development Program. However, the URL (https://www.cancer.gov/grants-training/training/idwb/dcd-program) leads to a "Page Not Found" error (Page Not Found/Error 404).
  • Step 2. Copy the URL with the missing document into the Wayback Machine search bar. If nothing comes up, check if the URL or website structure supports dropping parts of the site path without losing direction to the main page you're interested in.

��

4 of 14

Wayback Machine

  • Step 3. Search for the most recent blue snapshots. Blue hits reflect a successful snapshot of said webpage (see more details below). Please check to see if this has what you want (some snapshots may look promising but be a dead end). Identify the most recent snapshot that has what you want (just exercise your own judgment in selecting your chosen snapshot to report).
  • From Web Archive: “Why are some of the dots on the calendar page different colors? We color the dots, and links, associated with individual web captures, or multiple web captures, for a given day. Blue means the web server result code the crawler got for the related capture was a 2nn (good); Green means the crawlers got a status code 3nn (redirect); Orange means the crawler got a status code 4nn (client error), and Red means the crawler saw a 5nn (server error). Most of the time you will probably want to select the blue dots or links.”

5 of 14

Locating

Sexual

Orientation

and Gender

Identity

(SOGI) data in the US

Census

6 of 14

7 of 14

Abortion Surveillance Example

8 of 14

Abortion Surveillance Example

9 of 14

Abortion Surveillance Example

  • https://web.archive.org/web/20250201205120/https://www.cdc.gov/reproductive-health/data-statistics/abortion-surveillance-findings-reports.html

10 of 14

Abortion Surveillance Example

11 of 14

Maternal and Child Health Bureau Strategic Plan Example: https://mchb.hrsa.gov/sites/default/files/mchb/about-us/mchb-strategic-plan.pdf

12 of 14

Maternal and Child Health Bureau Strategic Plan Example

13 of 14

14 of 14

Non-Governmental Data Alternatives

  • As the US government removes health websites and data, a list is being developed of non-governmental data alternatives. This is a curated list of non-government websites with health databases. The list will continue to be updated (Naseem S. Miller | February 3, 2025)| (Note: many of these sources ultimately rely on governmental data)
  •  https://journalistsresource.org/home/as-the-us-government-removes-health-websites-and-data-heres-a-list-of-non-government-data-alternatives/
    • Pro-Publica
    • Association of Health Care Journalists’ Health Journalism Data
    • Kaiser Family Foundation
    • Congressional District Health Dashboard
    • Health Care Cost Institute
    • Pew Research Center
    • Institute for Health Metrics and Evaluation
    • County Health Rankings & Roadmaps
    • Rural Hospital Data
    • The United Network for Organ Sharing
    • American College of Surgeons National Surgical Quality Improvement Program
    • National Cancer Database
    • Harvard Dataverse
    • State Health Department data
    • Investigative Reporters and Editors