1 of 28

Expanding the ICPSR Bibliography of Data-related Literature to Study Communities of Data Reuse

Sara Lafia (Research Fellow, University of Michigan)

Elizabeth Moss (Librarian, ICPSR)

ICPSR Data Fair

September 19-23, 2022

2 of 28

Agenda

  1. Context: The existing ICPSR Bibliography
  2. Activity: Audience Poll - Data Citation Best Practices (Zoom)�
  3. Visualizing and exploring the ICPSR Bibliography
  4. Activity: Audience Discussion - Feedback on Visualization (Zoom)�
  5. Resources
  6. Q&A

2

3 of 28

Context: The existing ICPSR Bibliography

3

4 of 28

Database of data-literature links

  • Over 100,000 linked citations

  • Meant to help you discover data via the citing literature

  • Curated by four ICPSR Bibliography staff

4

Citation

Full

text

ICPSR Study(s)

5 of 28

5

6 of 28

6

7 of 28

We LINK studies to publications that use data from those studies.

We attempt to COMPILE the universe of use and reuse for each study by capturing and linking the “citing” publications.

We DESCRIBE these linkages in resources we create based on how the data are discussed in the literature.

7

8 of 28

EXTRACT and VISUALIZE relationships

8

9 of 28

Basics

  • Main sources
  • Collection criteria--data analyzed--a collection based on one main type of data use

Manual Searching

  • Scale of collection
  • Customized queries and human verification

Automated Searching

  • API querying
  • Non-analysis usage
  • Data citation implications

9

10 of 28

Main sources

  • People
  • Depositors

  • Public and restricted data reusers

  • Research teams

New Sources

  • Overton Interdisciplinary

  • Policy Commons

  • DimensionsPlus

Platforms

  • Google Scholar

  • PM/PMC

  • ProQuest

  • Scopus/ScienceDirect

  • EBSCOhost

Individual Publishers/Journals

  • Wiley

  • NCJRS

  • Journal ToCs

10

11 of 28

Collection criteria

11

https://www.icpsr.umich.edu/web/pages/ICPSR/citations/collection-criteria.html

12 of 28

12

1. Is this already in the Bibliography? If no . . .

3. Where does query string appear in text? (tables, footnotes, methods, supplement, acknowledgements, grant number?)

4. If a good “hit” (data analyzed) – 5. If a bad “hit” (just mentioned or pub cited)— 5a.

If 5a, find cited pubs & check for #1--#5.

If 5. Does ICPSR catalog have same years, waves, panel, sample?

2. Were the data formally cited in the references? If no . . .

6. Collect citation and associate with study numbers from ICPSR catalog.

7. Was any year/wave /phase clearly analyzed but not cited in the references? Add those.

8. Did author use something in a series, but didn’t say which specific study? Collect for series.

https://www.icpsr.umich.edu/web/pages/ICPSR/citations/biblio-ref.html

Heuristic to Evaluate Data Use in Publications

13 of 28

Study DOI citation practice

We evaluated:

  • Dimensions API query results
  • DOIs for 11,000+ ICPSR studies
  • Search results examined for collectibility

We found:

  • We could collect 626 publications that met our collection criteria
  • We had to reject 1,031 that did not?
  • Why were study citations used if not to cite data analysis?

13

14 of 28

14

When and what

should you cite?

15 of 28

Archive-provided study citation/DOI may be convenient, but . . .

More clarity is needed from various authorities about when to do so:

  • Data/object producer
  • Distributors of record
  • Style guides
  • Journals’ author instructions
  • Instructors assigning works containing use of data

What deserves a citation? In what form?

  • Methodological concepts, inspiration
  • Items from survey instruments
  • Mentions of similar sources
  • Mentions of brief data points
  • Constructed measures
  • Codebooks

15

16 of 28

Activity: Audience Poll

In my publications, I cite, or recommend that others cite, data when I

(select all that apply):

  • Describe a study or a feature in a study (e.g., a variable)
  • Refer to study documentation (e.g., a codebook)
  • Use questions from a study in my work
  • Credit a study design or methodology that inspired my approach
  • Reuse the study's data in my analysis
  • Publish data I derived from analyzing an existing study
  • Other

16

17 of 28

Visualizing and exploring ICPSR’s citation network

17

18 of 28

Using the ICPSR Bibliography to study scholarly communication and impact

18

What are researchers data search needs?

What impact does data curation have on data reuse?

How should we prioritize curation to achieve impact and return on investment?

NSF-funded projects: MICA and RecSys

Source: ACRL (2021)

19 of 28

What can we learn from studying citation networks?

19

Example from 150 years of Nature: linked articles show the flow of ideas that inspire across disciplines

20 of 28

Modeling data citations in the ICPSR Bibliography as a network

20

Teenage Attitudes and Practices Survey, 1989: [United States] (ICPSR 9786)

National Survey on Drug Use and Health (NSDUH) Series (1979-2014)

Crime, juvenile delinquency, and dysfunctional behavior (Sherman, 2003)

SERIES/STUDIES: Which data are used together?

PUBLICATIONS: Which disciplines are using data?

Central America and the international trade in drugs (Bunck et al., 2015)

21 of 28

Overview of the ICPSR data co-citation network

21

Nodes: ICPSR studies or grouped series

Edges: Studies used together in 2+ publications

Edge weight: Number of co-citations

22 of 28

Finding “hubs” in the ICPSR data co-citation network

22

General Social Survey Series

American National Election Study (ANES) Series

Current Population Survey Series

Census of Population and Housing, 1790-1950 [United States] Series

National Health Interview Survey Series

Uniform Crime Reporting Program Data Series

Studies

High-degree

High betweenness

Both

23 of 28

Detecting communities of data use

23

41 communities (minimum size of three studies/series) labeled with most common subject terms

24 of 28

User stories: data “subdivisions”

24

inmates, correctional facilities, United States

e.g., Improving Correctional Classification, New York, 1981-1983 (ICPSR 8437)

Disciplinary resources

Scenario: finding compatible, topical datasets for instruction

25 of 28

User stories: data “crossroads”

25

multiple (demographic characteristics, education)

e.g., India Human Development Survey (IHDS), 2005 (ICPSR 22626)

Connective resources

Scenario: finding interoperable studies with broad utility for starting a project

26 of 28

Activity: Audience Discussion

Explore the interactive visualization (https://tinyurl.com/icpsr-datasets) with the following questions in mind:

  • Who do you envision using a visualization like this?
  • What other information would you like this visualization to show?

Please type your feedback into the Zoom chat (3 min.)

26

27 of 28

Resources

Banaeefar, H., Burchart, S., Moss, E., & Palvolgyi-Polyak, E. (2022). Best practice may not be enough: Variation in data citation using DOIs. [Poster]. Presented at IASSIST 2022. https://doi.org/10.7302/4809

Lafia, S., Fan, L., Thomer, A., & Hemphill, L. (accepted). Subdivisions

and Crossroads: Identifying Hidden Community Structures in a

Data Archive's Citation Network. Quantitative Science Studies

(QSS). https://doi.org/10.48550/arXiv.2205.08395

Lafia, S. (2022). ICPSR Bibliography Citation Network (February

2022) [Data set]. Inter-university Consortium for Political and Social

Research (ICPSR). https://doi.org/10.3886/E174361V1

Lafia, S. (2022). ICPSR/data-communities (Version v1.0.0) [Code].

https://doi.org/10.5281/zenodo.6799127

27

28 of 28

Contact us

Sara Lafia

slafia@umich.edu

Elizabeth Moss

eammoss@umich.edu

Support 60 more years of data: https://myumi.ch/ICPSR-60MoreYears