Expanding the ICPSR Bibliography of Data-related Literature to Study Communities of Data Reuse
Sara Lafia (Research Fellow, University of Michigan)
Elizabeth Moss (Librarian, ICPSR)
ICPSR Data Fair
September 19-23, 2022
Agenda
2
Context: The existing ICPSR Bibliography
3
Database of data-literature links
4
Citation
Full
text
ICPSR Study(s)
5
We LINK studies to publications that use data from those studies.
We attempt to COMPILE the universe of use and reuse for each study by capturing and linking the “citing” publications.
We DESCRIBE these linkages in resources we create based on how the data are discussed in the literature.
7
EXTRACT and VISUALIZE relationships
8
Basics
Manual Searching
Automated Searching
9
Main sources
New Sources
Platforms
Individual Publishers/Journals
10
Collection criteria
11
https://www.icpsr.umich.edu/web/pages/ICPSR/citations/collection-criteria.html
12
1. Is this already in the Bibliography? If no . . .
3. Where does query string appear in text? (tables, footnotes, methods, supplement, acknowledgements, grant number?)
4. If a good “hit” (data analyzed) – 5. If a bad “hit” (just mentioned or pub cited)— 5a.
If 5a, find cited pubs & check for #1--#5.
If 5. Does ICPSR catalog have same years, waves, panel, sample?
2. Were the data formally cited in the references? If no . . .
6. Collect citation and associate with study numbers from ICPSR catalog.
7. Was any year/wave /phase clearly analyzed but not cited in the references? Add those.
8. Did author use something in a series, but didn’t say which specific study? Collect for series.
https://www.icpsr.umich.edu/web/pages/ICPSR/citations/biblio-ref.html
Heuristic to Evaluate Data Use in Publications
Study DOI citation practice
We evaluated:
We found:
13
14
When and what
should you cite?
Archive-provided study citation/DOI may be convenient, but . . .
More clarity is needed from various authorities about when to do so:
What deserves a citation? In what form?
15
Activity: Audience Poll
In my publications, I cite, or recommend that others cite, data when I
(select all that apply):
16
Visualizing and exploring ICPSR’s citation network
17
Using the ICPSR Bibliography to study scholarly communication and impact
18
What are researchers data search needs?
What impact does data curation have on data reuse?
How should we prioritize curation to achieve impact and return on investment?
NSF-funded projects: MICA and RecSys
Source: ACRL (2021)
What can we learn from studying citation networks?
19
Example from 150 years of Nature: linked articles show the flow of ideas that inspire across disciplines
Modeling data citations in the ICPSR Bibliography as a network
20
Teenage Attitudes and Practices Survey, 1989: [United States] (ICPSR 9786)
National Survey on Drug Use and Health (NSDUH) Series (1979-2014)
Crime, juvenile delinquency, and dysfunctional behavior (Sherman, 2003)
SERIES/STUDIES: Which data are used together?
PUBLICATIONS: Which disciplines are using data?
Central America and the international trade in drugs (Bunck et al., 2015)
Overview of the ICPSR data co-citation network
21
Nodes: ICPSR studies or grouped series
Edges: Studies used together in 2+ publications
Edge weight: Number of co-citations
Finding “hubs” in the ICPSR data co-citation network
22
General Social Survey Series
American National Election Study (ANES) Series
Current Population Survey Series
Census of Population and Housing, 1790-1950 [United States] Series
National Health Interview Survey Series
Uniform Crime Reporting Program Data Series
Studies
High-degree
High betweenness
Both
Detecting communities of data use
23
41 communities (minimum size of three studies/series) labeled with most common subject terms
User stories: data “subdivisions”
24
terrorism, terrorists, radicalism
police citizen interactions, police effectiveness, police response
inmates, correctional facilities, United States
e.g., Improving Correctional Classification, New York, 1981-1983 (ICPSR 8437)
Disciplinary resources
Scenario: finding compatible, topical datasets for instruction
User stories: data “crossroads”
25
multiple (substance abuse treatment, program evaluation)
e.g., Evaluation of the Los Angeles County Juvenile Drug Treatment Boot Camp, 1992-1998 (ICPSR 3157)
multiple (demographic characteristics, education)
e.g., India Human Development Survey (IHDS), 2005 (ICPSR 22626)
multiple (terrorism - demographic characteristics)
Connective resources
Scenario: finding interoperable studies with broad utility for starting a project
Activity: Audience Discussion
Explore the interactive visualization (https://tinyurl.com/icpsr-datasets) with the following questions in mind:
Please type your feedback into the Zoom chat (3 min.)
26
Resources
Banaeefar, H., Burchart, S., Moss, E., & Palvolgyi-Polyak, E. (2022). Best practice may not be enough: Variation in data citation using DOIs. [Poster]. Presented at IASSIST 2022. https://doi.org/10.7302/4809
Lafia, S., Fan, L., Thomer, A., & Hemphill, L. (accepted). Subdivisions
and Crossroads: Identifying Hidden Community Structures in a
Data Archive's Citation Network. Quantitative Science Studies
(QSS). https://doi.org/10.48550/arXiv.2205.08395
Lafia, S. (2022). ICPSR Bibliography Citation Network (February
2022) [Data set]. Inter-university Consortium for Political and Social
Research (ICPSR). https://doi.org/10.3886/E174361V1
Lafia, S. (2022). ICPSR/data-communities (Version v1.0.0) [Code].
27
Contact us
Sara Lafia
slafia@umich.edu
Elizabeth Moss
eammoss@umich.edu
Support 60 more years of data: https://myumi.ch/ICPSR-60MoreYears