1 of 54

Introduction to

Digital Humanities

Week 2.1 – Data Culture(s)

2 of 54

database <-> narrative

3 of 54

> Lev Manovich, Database as symbolic form (1999)

  • database = a new symbolic form of the 21th century
  • new form of cultural expression
  • distinct from the narrative
  • … although it will sometimes present it as such

database <-> narrative

4 of 54

Microsoft Encarta ‘95 cd-rom case

5 of 54

searchers vs. browsers

6 of 54

7 of 54

8 of 54

9 of 54

10 of 54

11 of 54

> searchers

Use the Princeton Postcard Collection to find how your college (e.g., Princeton or its colleges/buildings) is represented in the collection.

> browsers

Freely explore the Princeton Postcard Collection without a specific search term. Focus on discovering interesting, unexpected, or surprising items related to Princeton history.

searchers vs. browsers (~ ca. 10 mins)

12 of 54

> searchers

  • Search for postcards that specifically depict your college (or e.g. Dillon Gym).
  • Note how the search parameters/fields helped or hindered your ability to find relevant items.

> browsers

  • Navigate the collection without a clear goal.
  • Pay attention to how you’re guided by the categories, tags, or images.
  • Look for serendipitous or unexpected discoveries!

searchers vs. browsers (~ ca. 10 mins)

13 of 54

> searchers

  • What did you find? How did you find ‘it’?
  • How did ‘search’ shape your experience?

> browsers

  • What is the most surprising or interesting item you discovered?
  • How did ‘browse’ shape your experience?

searchers vs. browsers (~ ca. 10 mins)

14 of 54

humanities <-> data?

15 of 54

The Lonedale Operator (1911), D.W. Griffith

melodrama in silent film

16 of 54

Chaucer’s

vs.

Shakespeares

language

17 of 54

18 of 54

date

event

person mentioned

9-22-1970

asked on a date

Anthony

9-25-1970

study in library

Bella, Anthony, Thomas

9-27-1970

birthday party

Mila, Joan, Jess, Marc

19 of 54

Miriam Posner’s Silent Film & Melodramatic Conventions Dataset

20 of 54

screenshot of just one of my many many ‘data’ folders

21 of 54

22 of 54

n-gram revolution

23 of 54

24 of 54

25 of 54

26 of 54

is literature data?

Google Books, in its way, represents an even more profound shift than the printing press, because it ends the relationship to the codex [...]

  • Marche (2012)

27 of 54

is literature data?

Before EEBO [Early English Books Online] arrived, every English scholar of the Renaissance had to spend time at the Bodleian library in Oxford; that’s where one found one’s material. But actually finding the material was only a part of the process of attending the Bodleian, where connections were made at the mother university in the land of the mother tongue. Professors were relics; they had snuffboxes and passed them to the right after dinner, because port is passed left. EEBO ended all that, because the merely practical reason for attending the Bodleian was no longer justifiable when the texts were all available online.

  • Marche (2012)

28 of 54

> you imply . . .

  • that it is comprised of distinct, interchangeable elements
  • that it can be processed through computational means
  • that its meaningful qualities can be enumerated in a finite list
  • replicating the same procedures on the identical data (by another individual) will yield identical outcomes

when you call something ‘data’

29 of 54

  • Where did you get inspiration from?
  • What did you find?
  • (What) Did you learn?
  • Are you convinced? Can you convince others?
  • What did you find surprising/confusing?
  • What do you need to know about the ‘data’ in order to make your argument?

your n-gram-based research

30 of 54

n-gram revolution

31 of 54

Cora

32 of 54

Todd

33 of 54

Todd

34 of 54

Carl

35 of 54

Theo

36 of 54

my experience

37 of 54

Experiment: Who is mentioned more, Shakespeare or Milton?

PROGRAM CountMentionsInFiles:

DATA txt_files

VARIABLES:

- txt_files_mentioning_milton = 0

- txt_files_mentioning_shakespeare = 0

FOR EACH txt_file in list of txt_files:

file_content = read_file_content(txt_file)

IF ' Milton ' is found in txt_file:

ADD 1 to txt_files_mentioning_milton

IF ' Shakespeare ' is found in file_content:

ADD 1 to txt_files_mentioning_shakespeare

DISPLAY txt_files_mentioning_milton

DISPLAY txt_files_mentioning_shakespeare

END PROGRAM

38 of 54

Experiment: Who is mentioned more, Shakespeare or Milton?

Milton is mentioned in 2854 texts.

Shakespeare is mentioned in 2040 texts.

→ Milton : Shakespeare ratio = 1.3990196078

😎

39 of 54

😎

40 of 54

Experiment: Who is mentioned more, Shakespeare or Milton?

the illustrations from Shakespeare in the notes

characteristics mentioned by Milton are found in Vergil

paſſage of Narciſſus probably gave Milton the hint

the great poets,— of even Shakespeare himself

as Shakespeare, Milton and Pope, the writers

such court-fools as Shakespeare might have

works of Shakespeare and John Mil-

ton have no

😬

41 of 54

😰

42 of 54

Shakespeare: 2674

Shakespere: 193

Shakspeare: 1467

Shakespear: 274

Shaksper: 22

Shackspeare: 8

Shackspear: 1

Shaxpere: 5

Shaxberd: 6

Shakspere: 691 Shakesspeare: 0 Shackspere: 4

Shakesphere: 2

43 of 54

Shakespeare: 2674

Shakespere: 193

Shakspeare: 1467

Shakespear: 274

Shaksper: 22

Shackspeare: 8

Shackspear: 1

Shaxpere: 5

Shaxberd: 6

Shakspere: 691 Shakesspeare: 0 Shackspere: 4

Shakesphere: 2

shakespeare: 33

shakespere: 4

shakspeare: 40

shakespear: 24

shaksper: 1

shackspeare: 0

shackspear: 0

shaxpere: 0

shaxberd: 0

shakspere: 12

shakesspeare: 0

shackspere: 0

shakesphere: 0

44 of 54

Shakespeare: 2674

Shakespere: 193

Shakspeare: 1467

Shakespear: 274

Shaksper: 22

Shackspeare: 8

Shackspear: 1

Shaxpere: 5

Shaxberd: 6

Shakspere: 691 Shakesspeare: 0 Shackspere: 4

Shakesphere: 2

shakespeare: 33

shakespere: 4

shakspeare: 40

shakespear: 24

shaksper: 1

shackspeare: 0

shackspear: 0

shaxpere: 0

shaxberd: 0

shakspere: 12

shakesspeare: 0

shackspere: 0

shakesphere: 0

Shakefpeare: 36

Shakospeare: 4

45 of 54

😭

Shakeſpeare : 125

46 of 54

Incidence of the word-forms "laft" and "last" in English documents from 1700 to 1900, according to Google's web n-grams database. Based on OCR scans of books, which can misidentify the long S as "f".

47 of 54

discussion

48 of 54

49 of 54

Data Biography

50 of 54

Data Biography

> Create a Data Biography for a humanities dataset, a narrative about a dataset’s lifecycle, including its creation, usage, and milestones.

> Learn more about it here: Krause, Heather. “Data Biographies: Getting to Know Your Data.” Global Investigative Journalism Network, 27 Mar. 2017.

Background check on a dataset:

  • Where did it come from?
  • Who collected it?
  • How was it collected?
  • Why was it collected?

“Getting to know your data can reveal crucial gaps, bias, misinformation, or overlooked details in your story.”

51 of 54

Data Biography

  • Choose from listed resources or find one independently
  • Post dataset choice in #data-bio Slack channel

Your Data Biography should tell a story about the dataset that addresses the following key aspects:

  • Introduce the dataset and its contents. What kind of information is in there? How much data is there?
  • Who collected, processed, and made available the data?
  • How was the data collected, processed, and made available?
  • Why was the data collected, what are its intended research questions?
  • Where is the data stored today? How did you access it?
  • When was the data collected?
  • Considers potential limitations, biases, gaps, or ethical issues in the data.

52 of 54

Data Biography

> Some additional notes:

  • Think about the data’s journey from its original historical context to its current digital form.
  • Krause’s template spreadsheet is a useful starting point, but your final Data Biography should extend beyond this, bringing this information together into a coherent narrative.
  • Dig deep. Read through “About” or “Methodology” sections, or document the absence of information.

> Submit by email a 3~5 page (1000-1500 words) paper by October 15, 11:59PM

> See full assignment description here

53 of 54

for next session

> Pre-Class Annotations (no reflection this time!)

  • Make sure your annotations are “in” before 11:59 PM on the day before our class.

54 of 54

references

Krause, Heather. “Data Biographies: Getting to Know Your Data.” Global Investigative Journalism Network, 27 Mar. 2017, https://gijn.org/stories/data-biographies-getting-to-know-your-data/.

Marche, Stephen. “Literature Is Not Data: Against Digital Humanities.” Los Angeles Review of Books, 28 Oct. 2012, https://lareviewofbooks.org/article/literature-is-not-data-against-digital-humanities/.

Michel, Jean-Baptiste, et al. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science, vol. 331, no. 6014, Jan. 2011, pp. 176–82.

Posner, Miriam. Humanities Data: A Necessary Contradiction. 25 June 2015.

Ramsay, Stephen. “The Hermeneutics of Screwing Around; or What You Do with a Million Books.” Pastplay: Teaching and Learning History with Technology, edited by Kevin B. Kee, University of Michigan Press, 2014, pp. 111–20.

Rosenberg, Daniel. “Data before the Fact.” Raw Data Is an Oxymoron, 2013, pp. 15–40.