1 of 53

always already computational

collections as data

illustration by adam ferriss

thomas padilla --- unlv

laurie allen --- upenn

stewart varner --- upenn

sarah potvin --- texas a&m

elizabeth russey roke --- emory

hannah frost --- stanford

2 of 53

American Historical Association; American Philosophical Society; British Library; Carnegie Museum of Art; Compute Canada; Cornell University; Deutsches Klimarechenzentrum; Digital Library Federation; Digital Public Library of America; Emory University; Getty Research Institute; Hathitrust Research Center; Haverford College; Indiana University; Indiana University-Purdue University; Internet Archive; James Madison University; Koninklijke Bibliotheek; Library of Congress; Max Planck Computing & Data Facility; McGill University; Michigan State University; Massachusetts Institute of Technology; Museum of Modern Art; National University of Singapore; New York Public Library; New York University; Northeastern University; Open Knowledge Foundation; Penn State University; Stanford University; Swarthmore College; Texas A&M University; Tufts University; University College London; University of British Columbia; University of California Santa Barbara; University of California Berkeley; University of California Los Angeles; University of Canberra; University of Delaware; University of Graz; University of Houston; University of Illinois at Urbana Champaign; University of Maryland; University of Miami; University of Minnesota; University of North Carolina at Chapel Hill; University of Pennsylvania; University of Toronto; University of Utah; University of Texas Austin; Vanderbilt University; Wellcome Trust; Wheaton College;

York University . . . . . . . . . . . . . . . . . . . . .

3 of 53

goals

  1. discuss how we can strengthen the utility of the project for libraries writ large

  • encourage participation in project deliverable development from session participant institutions

4 of 53

5 of 53

  1. creation of ​a . . . collections as data framework that supports . . . collection . . . transformation

  • ​development of computationally amenable collection use cases and personas

  • functional requirements that support development of technical solutions

6 of 53

liz tatarintseva

7 of 53

liz tatarintseva

8 of 53

9 of 53

10 of 53

11 of 53

The Santa Barbara Statement calls for a rethinking of data documentation . . . our more ethical collection documentation is also an instructional tool and we should see documentation as another opportunity for living out our professional commitments to information literacy.

. . . taking these principles seriously illuminates a path for pedagogy that frames our digital collections as something with which students can critically engage by assessing strengths and gaps, especially in terms of missing narratives.

12 of 53

13 of 53

2016

Predominant digital collection development focuses on replicating traditional ways of interacting with objects in a digital space. This approach does not meet the needs of the researcher, the student, the journalist, and others who would like to use computational methods and tools to work with …

collections as data.

14 of 53

collections as data

… ordered information

… stored digitally

… amenable to computation

15 of 53

16 of 53

procedural

data affords a capacity for computational processing, e.g. term frequency analysis, named entity extraction, and topic modeling

participatory

data affords a capacity for enrichment by a diverse set of users, e.g. crowdsourced transcription

encyclopedic

data affords a capacity for expanded access, e.g. parametric searching by granular features like line length, genre, author gender

spatial

data affords a capacity for spatial characteristics to be surfaced, e.g. place names can be geocoded and mapped

adapted from Janet H. Murray, affordance grid

17 of 53

18 of 53

19 of 53

20 of 53

21 of 53

22 of 53

23 of 53

24 of 53

criticalhandgestures.tumblr.com

25 of 53

in very general terms, an agent is a being with the capacity to act, and ‘agency’ denotes the exercise or manifestation of this capacity.

stanford encyclopedia of philosophy, agency

26 of 53

27 of 53

discovering

annotating

comparing

referring

sampling

illustrating

representing

28 of 53

reuse

reproducibility

integrity

authenticity

permanence

attribution

29 of 53

How do we make our stuff more useful?

30 of 53

Increase fit for purpose

Enhance discoverability

Expand access methods

31 of 53

  • creation of ​a . . . collections as data framework that supports . . . collection . . . transformation

  • ​development of computationally amenable collection use cases and personas

  • functional requirements that support development of technical solutions

32 of 53

a social and technical challenge

33 of 53

34 of 53

  • national forum

  • santa barbara statement

  • facets

  • use cases, personas, functional req.

  • workshop/community-palooza

35 of 53

36 of 53

What is the scope of any aspect of this work?

What types of use does this work serve?

Who does this work serve?

What partners can join you in the work?

What approaches to the work already exist?

What ethical considerations should be engaged?

What challenges exist?

What present and future opportunities exist?

37 of 53

  • national forum

  • santa barbara statement

  • facets

  • use cases, personas, functional req.

  • workshop/community-palooza

38 of 53

39 of 53

40 of 53

  • national forum

  • santa barbara statement

  • facets

  • use cases, personas, functional req.

  • workshop/community-palooza

41 of 53

Collections as Data Facets

facet \ˈfa-sət\: one side of something many-sided

=================================================

Collections as Data Facets document collections as data implementations.

An implementation consists of the people,

services, practices, technologies, and infrastructure that aim to encourage computational use of cultural heritage collections.

42 of 53

1. Why do it

2. Making the Case

3. How you did it

4. Share the docs

5. Understanding use

6. Who supports use

7. Things people should know

8. What’s next

43 of 53

  • Eleanor Dickson (University of Illinois at Urbana Champaign)

Hathitrust Research Center Extracted Features Dataset

  • Michael Zarafonetis and Sarah M. Horowitz (Haverford College)

Beyond Penn’s Treaty

  • Brook Lillehaugen and Michael Zarafonetis (Haverford College)

Ticha: A Digital Text Explorer for Colonial Zapotec

  • Veronica Ikeshoji-Orlati (Vanderbilt University)

Vanderbilt Library Legacy Data Projects

  • Jonathan Lill (MoMA Archives)

The Museum of Modern Art Exhibition Index

44 of 53

  • national forum

  • santa barbara statement

  • facets

  • personas, use cases, functional req.

  • workshop/community-palooza

45 of 53

46 of 53

47 of 53

  • national forum

  • santa barbara statement

  • facets

  • use cases, personas, functional req.

  • workshop/community-palooza

48 of 53

on the books

Society of American Archivists, July 27, 2017

Digital Humanities 2017, August 7, 2017

Digital Library Federation, October 25, 2017

CNI Fall Forum, 2017

American Historical Association, 2018

planned

National Forum 2 @ UNLV

NICAR, 2018

Open Repositories, 2018

DLF 2018

49 of 53

project site

collectionsasdata.github.io/

collections as data google group

bit.ly/cadggroup

50 of 53

liz tatarintseva

51 of 53

goals

  • discuss how we can strengthen the utility of the project for libraries writ large

  • encourage participation in project deliverable development from session participant institutions

52 of 53

  1. What are the implications of CAD for repository development?
  2. What are the implications of CAD for all parts of the library (liaison, collections, budgeting, etc)?
  3. What are the biggest barriers to creating Collections as Data?
  4. Who wants to use collections in these ways?

53 of 53

  1. What would be most useful to your institution to support development of Collections as Data there?
  2. Do you have thoughts about how Collections as Data implementations might be supported at scale?
  3. What parts of the development of collections as data seem most difficult? What seems easiest?