1 of 24

Finding & Working With Humanities Data

Paige Morgan

University of Miami

p.morgan@miami.edu

@paigecmorgan

http://paigecmorgan.youcanbook.me

2 of 24

We can’t cover everything -- make an appointment to talk more!

3 of 24

Goals for today

  • Know where to start looking for data
  • Be able to look at a dataset, and start assessing its value for your goals

4 of 24

Three assertions for working with data

  • Look in several places for data -- creators are still figuring out how to disseminate it.
  • Data is a representation, not the truth.
  • You can only ask questions that you have prepared your data to answer.

(We’ll come back to these later…)

5 of 24

What’s data? What’s a dataset?

Data: could be all sorts of info, mixed together, or not

Dataset: implies that this data is meant to go together, and be cohesive.

6 of 24

What should you look for when you’re searching for data?

  • Consistency in categories
    • Bad example: Doctor Who Villains
  • Explanations of methodology & provenance
  • Format:
    • CSV, XLS/XLSX, XML, TXT

7 of 24

Vocabulary

  • Controlled vocabulary: An organized arrangement of words and phrases used to index content and/or to retrieve content through browsing or searching. (Getty)

  • Interpretive layer: A classification system that groups items together to make them more easily discoverable. (e.g. music genres; menu sections (meat, vegetarian, vegan)

8 of 24

Places to look for data and datasets

  • Jeremy Singer-Vine’s Data is Plural newsletter
  • Creative Commons Beta Search
  • Twitter
    • keyword + “dataset” + “available” (optional)
  • Google Search specifying filetype
    • e.g. “filetype:csv”
    • search for both xls and xlsx

9 of 24

Understanding what you’ve found

10 of 24

Vocabulary

  • Structured data: data that has been organized so that software can process it (e.g., according to queries)

  • Data model: a diagram of the way that different parts of your data are connected with each other.

11 of 24

11

12 of 24

13 of 24

Lost friends data model (simplified)

13

14 of 24

Lost friends data model (more complex)

14

What questions could we ask of the data if it were structured this way?

15 of 24

Data is a representation;

not the truth.

(Data Management Body of Knowledge, 2nd edition)

16 of 24

Questions for thinking about representation

  • What is the root source (or sources) you are working with?
  • Who are you creating this data for?
  • What are you attempting to represent?

17 of 24

18 of 24

  • On air since 1942
  • Features “notable” figures as “castaways”
  • Each castaway chooses 7 favorite records (songs)
  • One record is designated as their favorite.
  • They also choose one book*, and one luxury item

18

* (The book is in addition to the complete works of Shakespeare, and the Bible (or other religious text).

19 of 24

Data Conditions

20 of 24

Here’s the data...

21 of 24

What am I trying to represent when I create data about guests’ roles?

22 of 24

What questions can we ask?

What questions can’t we ask?

What would we need to do to be able to ask them?

23 of 24

Three assertions for working with data

  • Look in several places for data -- creators are still figuring out how to disseminate it.
  • Data is a representation, not the truth.
  • You can only ask questions that you have prepared your data to answer.

What are the implications for the projects that you want to develop?

24 of 24

Thank you!

p.morgan@miami.edu

@paigecmorgan

Want more? Consider registering for DHSI 2019: Making Choices About Your Data (June 4-8, 2019)