1 of 64

go.cal.msu.edu/dhintroslides

2 of 64

Getting Started with Humanities Data:

Beginner Tools

Fall 2015 Digital Humanities Workshop Series

digitalhumanities.msu.edu | #msudh

3 of 64

Points of Contact

Kristen Mapes

kmapes@msu.edu

Devin Higgins

Thomas Padilla

Bobby Smiley

dts@mail.lib.msu.edu

Brandon Locke

blocke@msu.edu

4 of 64

Points of Contact

digitalhumanities.msu.edu

lib.msu.edu/dh

leadr.msu.edu

dh.cal.msu.edu

#msudh

5 of 64

Upcoming Events

September

October

November

9/16: Reading Group (Programming for Humanists)

9/23: Workshop

(Getting Data: Collections, Documents, and the Web)

9/30: Reading Group (Using R for Large Scale Text Analysis)

10/7: Workshop (Introduction to Data Visualization)

10/14: LOCUS (Digital Pedagogy)

10/21: Workshop (Introduction to Open Web Mapping)

10/23: Invited Speaker

(Darwin’s Semantic Voyage: Exploration and Exploitation of Victorian Science in the Reading Notebooks)

10/28: Reading Group (Digital Publishing)

11/4: Workshop (Introduction to TEI)

11/11: Reading Group (Undergraduate Labor Ethics)

11/18: LOCUS (Visualization and Narration)

6 of 64

Upcoming Events

September

October

November

9/16: Reading Group (Programming for Humanists)

9/23: Workshop

(Getting Data: Collections, Documents, and the Web)

9/30: Reading Group (Using R for Large Scale Text Analysis)

10/7: Workshop (Introduction to Data Visualization)

10/14: LOCUS (Digital Pedagogy)

10/21: Workshop (Introduction to Open Web Mapping)

10/23: Invited Speaker

(Darwin’s Semantic Voyage: Exploration and Exploitation of Victorian Science in the Reading Notebooks)

10/28: Reading Group (Digital Publishing)

11/4: Workshop (Introduction to TEI)

11/11: Reading Group (Undergraduate Labor Ethics)

11/18: LOCUS (Visualization and Narration)

7 of 64

Conferences

Web Archiving 2015

Chicago Colloquium on Digital Humanities & Computer Science

9/25/2015

9/26/2015

11/12-13/2015

11/13-15/2015

8 of 64

What is Digital Humanities?

presented

in digital form (s)

enabled

by digital methods and tools

about

digital technology and culture

building & experimenting with digital technology

critical

of its own digitalness

Humanistic Scholarship that is ...

Josh Honn, “Never Neutral: Critical Approaches to Digital Tools & Culture in the Humanities.” Last accessed September 9, 2014

9 of 64

Workshop Outline: 9/9/2015

  • Data Sources (Thomas Padilla / Devin Higgins)
  • Voyant (Kristen Mapes)
  • RAW (Thomas Padilla)
  • Palladio (Brandon Locke)

10 of 64

Workshop 9/9/15: Goals

  • Find humanities data in the library, or know whom to ask for help

(Data sources)

  • Load and export text, apply stop lists, and examine words trends (Voyant)
  • Know how to use data types to build visualizations (Raw)
  • Use data to construct basic network and geospatial visualizations (Palladio)

11 of 64

12 of 64

library originated data

purchased data

negotiated data

13 of 64

14 of 64

unstructured text

metadata

CSV

JSON

15 of 64

subsetter

16 of 64

Digital Humanities Data

17 of 64

Top Languages in Google Books Dataset

18 of 64

Unstructured Data

  • Chapters
  • Paragraphs
  • Sentences
  • Words
  • English language

19 of 64

Born-Digital vs. OCR

Born-Digital

Text typed, copied, or otherwise entered as text into a computer.

OCR (Optical Character Recognition)

Text generated by an automated attempt to “read” each character in an image of text.

20 of 64

21 of 64

unstructured text

metadata

CSV

JSON

22 of 64

Library Catalog Record Page

23 of 64

Author (Corporate)

Title

Publication

Subject

MARC Record,

MARCXML Format

24 of 64

Type of Record

Bibliographic Level

...

Date 1

Date 2

Place of Publication

Presence and type of Illustrations

Form of Item

Nature of contents

Government Publication

Literary Form

Language

...

25 of 64

26 of 64

unstructured text

metadata

CSV

JSON

27 of 64

CSV (Comma/Character) Separated Values

NAME,TYPE,COLOR

cherry,fruit,red

banana,fruit,yellow

carrot,vegetable,orange

eggplant,vegetable,purple

lime,fruit,green

JSON JavaScript Object Notation

{'fruit': [{'color': 'green', 'name': 'lime'},

{'color': 'yellow', 'name': 'banana'},

{'color': 'red', 'name': 'cherry'}],

'vegetable': [{'color': 'purple', 'name': 'eggplant'},

{'color': 'orange', 'name': 'carrot'}]}

  • Usable by any spreadsheet program
  • Has a simple, readable structure
  • Stored as plain-text file
  • Allows for nested structure
  • Can provide “lookup” functionality.
  • Can be read by all programming languages; used by many tools

CSV and JSON

28 of 64

Humanities Data Page: http://lib.msu.edu/dh/humdata

Alan Liu’s DH Datasets Page:

http://bit.ly/YwUt81

Contact Us:

https://www.lib.msu.edu/dh/

Where to Find Data

29 of 64

Voyant

  • Text analysis
    • Word frequencies
    • Keywords In Context
  • Examine 1 text or a corpus
  • Export data
  • Free & browser based

30 of 64

31 of 64

32 of 64

33 of 64

34 of 64

35 of 64

36 of 64

37 of 64

38 of 64

39 of 64

40 of 64

41 of 64

42 of 64

43 of 64

44 of 64

45 of 64

46 of 64

47 of 64

beta.voyant-tools.org

Tutorial for using Voyant 2 Beta: docs.voyant-tools.org/category/workshops

Voyant 1 version: voyant-tools.org

Tutorials for Voyant 1: docs.voyant-tools.org/start

48 of 64

Visualization

49 of 64

Explore: data

50 of 64

Explore: data

51 of 64

Discover: pattern, connection, structure

52 of 64

Communicate: information

53 of 64

. . . many tools

54 of 64

RAW

  • Easy to use - mitigate head hurts
  • Browser based - nothing to install
  • Basic data formats - data in familiar format
  • Intuitive controls - mitigate head hurts
  • Get data out - easy to share results
  • Bonus: its all pretty easy

55 of 64

RAW

56 of 64

RAW

go.cal.msu.edu/data

57 of 64

Palladio: Spatial & Network Viz

58 of 64

RAW Palladio

  • Easy to use - mitigate head hurts
  • Browser based - nothing to install
  • Basic data formats - data in familiar format
  • Intuitive controls - mitigate head hurts
  • Get data out - easy to share results
  • Bonus: it’s all pretty easy

59 of 64

Palladio: palladio.designhumanities.org

60 of 64

Palladio: palladio.designhumanities.org

61 of 64

Palladio

Pre-loaded sample data available at:

bit.ly/palladiosample

62 of 64

Palladio: people.csv

63 of 64

Palladio: places.csv

64 of 64

Thanks! Feedback?

go.cal.msu.edu/survey