1 of 57

go.cal.msu.edu/dhintroslides

2 of 57

Getting Started in the Digital Humanities:

Beginner Tools

Fall 2016 Digital Humanities Workshop Series

digitalhumanities.msu.edu | #msudh

3 of 57

What is Digital Humanities?

presented

in digital form (s)

enabled

by digital methods and tools

about

digital technology and culture

building & experimenting with digital technology

critical

of its own digitalness

Humanistic Scholarship that is ...

Josh Honn, “Never Neutral: Critical Approaches to Digital Tools & Culture in the Humanities.” Last accessed September 9, 2014

4 of 57

Points of Contact

Kristen Mapes

kmapes@msu.edu

Devin Higgins

Megan Kudzia

Marco Seiferle-Valencia

Scout Calvert

dsc@mail.lib.msu.edu

Brandon Locke

Alice Lynn McMichael

blocke@msu.edu

mcmich17@msu.edu

5 of 57

Points of Contact

6 of 57

People

digitalhumanities.msu.edu/people

Don’t see yourself here? Add yourself: go.cal.msu.edu/dhpeople

#msudh

msudh.slack.com/signup

7 of 57

Upcoming Events

September

October

November

December

9/20: Workshop

(Git and GitHub for Humanities)

10/5: Workshop (Visualizing Data with Tableau)

10/26: Workshop (3D Modeling with Photogrammetry)

11/15: Workshop (Brief Introduction to Topic Modeling)

12/1: LOCUS (Modeling)

8 of 57

Conferences

9/30/2016

9/30/2016

10/17 - 10/18/2016

10/20 - 10/22/2016

3/16 - 3/17/2017

9 of 57

Workshop Outline: 9/8/2016

  • What is DH? (Alice Lynn McMichael)
  • Resources at MSU (Megan Kudzia)
  • Voyant (Kristen Mapes)
  • Palladio (Brandon Locke)
  • Data Sources (Devin Higgins)

10 of 57

Workshop 9/8/16: Goals

  • Load and export text, apply stop lists, and examine words trends (Voyant)
  • Use data to construct basic network and geospatial visualizations (Palladio)
  • Find humanities data in the library, or know whom to ask for help (Data sources)

11 of 57

Voyant

  • Text analysis
    • Word frequencies
    • Keywords In Context
  • Examine 1 text or a corpus
  • Export data
  • Free & browser based

12 of 57

13 of 57

14 of 57

15 of 57

16 of 57

17 of 57

18 of 57

19 of 57

20 of 57

21 of 57

22 of 57

23 of 57

24 of 57

25 of 57

26 of 57

27 of 57

28 of 57

29 of 57

voyant-tools.org

Find documents at:

tinyurl.com/douglassworkshop

Guide for using Voyant (version 2): voyant-tools.org/docs/#!/guide/about

30 of 57

Palladio: Spatial & Network Viz

31 of 57

Palladio: hdlab.stanford.edu/palladio/

32 of 57

Palladio: hdlab.stanford.edu/palladio/

33 of 57

Palladio | When it’s right for you

  • You have structured data (spreadsheets)
  • Your data has lots of attributes and facets
  • Want to visualize time, space, and/or relationships
  • You want to do a fairly quick, preliminary analysis of data�

From Miriam Posner: github.com/miriamposner/palladio_workshop

34 of 57

Palladio | When it’s wrong for you

  • You have unstructured data (text, video, images without much metadata)
  • You want to count things
  • You want complex analysis and customization
  • You want interactive visualizations

From Miriam Posner: github.com/miriamposner/palladio_workshop

35 of 57

Palladio: Cushman-Collection-condensed.csv

Data from Miram Posner (github.com/miriamposner/palladio_workshop) | Collection at Indiana Univ Library (github.com/iulibdcs/cushman_photos)

download this data at bit.ly/msudhcushman

36 of 57

Palladio

Download the Cushman data at:

bit.ly/msudhcushman

then head to

hdlab.stanford.edu/palladio/

37 of 57

unstructured text

metadata

CSV

JSON

38 of 57

Unstructured Data

  • Chapters
  • Paragraphs
  • Sentences
  • Words
  • English language

39 of 57

Born-Digital vs. OCR

Born-Digital

Text typed, copied, or otherwise entered as text into a computer.

OCR (Optical Character Recognition)

Text generated by an automated attempt to “read” each character in an image of text.

40 of 57

41 of 57

42 of 57

library originated data

purchased data

negotiated data

43 of 57

44 of 57

subsetter

45 of 57

Digital Humanities Data

46 of 57

Top Languages in Google Books Dataset

47 of 57

Ask Us About Data

We can work with publishers/vendors on your behalf.

Coming soon,

Datasets from Gale:

  • Sabin Americana: 1500-1926
  • Selections from Archives Unbound
  • Selections from Slavery and Anti-Slavery: A Transnational Archive

48 of 57

unstructured text

metadata

CSV

JSON

49 of 57

Library Catalog Record Page

50 of 57

Author (Corporate)

Title

Publication

Subject

MARC Record,

MARCXML Format

51 of 57

Type of Record

Bibliographic Level

...

Date 1

Date 2

Place of Publication

Presence and type of Illustrations

Form of Item

Nature of contents

Government Publication

Literary Form

Language

...

52 of 57

53 of 57

unstructured text

metadata

CSV

JSON

54 of 57

CSV (Comma/Character) Separated Values

NAME,TYPE,COLOR

cherry,fruit,red

banana,fruit,yellow

carrot,vegetable,orange

eggplant,vegetable,purple

lime,fruit,green

JSON JavaScript Object Notation

{'fruit': [{'color': 'green', 'name': 'lime'},

{'color': 'yellow', 'name': 'banana'},

{'color': 'red', 'name': 'cherry'}],

'vegetable': [{'color': 'purple', 'name': 'eggplant'},

{'color': 'orange', 'name': 'carrot'}]}

  • Usable by any spreadsheet program
  • Has a simple, readable structure
  • Stored as plain-text file
  • Allows for nested structure
  • Can provide “lookup” functionality.
  • Can be read by all programming languages; used by many tools

CSV and JSON

55 of 57

56 of 57

Humanities Data Page: http://lib.msu.edu/dh/humdata

Alan Liu’s DH Datasets Page:

http://bit.ly/YwUt81

Contact Us:

https://www.lib.msu.edu/dh/

Where to Find Data

57 of 57

Thanks! Feedback?

go.cal.msu.edu/survey