Basic text analysis Workshop: Using Voyant Tools

THATCamp ASECS2012

Seth Denbo

Gmail: sdenbo

twitter: @seth_denbo

In this workshop I will give a short introduction to the use of computational text analysis using a web-based set of tools called Voyant Tools. I will briefly talk about text analysis software, and what Voyant does. I will then show how to get texts into Voyant and guide the participants through using the tools on some eighteenth-century texts. The workshop will also demonstrate some of the capabilities that Voyant Tools has, and look at a few different possibilities for how to use it. Participants will also be given the opportunity to do some work on their own texts using Voyant.

1. What is text analysis software?

Text analysis software enables users to determine the frequency with which words or phrases are used, create concordances, view words in context, and otherwise study patterns in texts.

2. What is Voyant?

Voyant is a web-based text analysis environment. It is designed to be user-friendly, flexible and powerful. The set of tools are primarily focussed on various ways of looking at word occurrence, frequency and to some extent context. Voyant provides tables and graphs related to word use across a single document or a collection.

What can you do with Voyant?

* use texts in a variety of formats including plain text, HTML, XML, PDF, RTF and MS Word

* use texts from different locations, including URLs and uploaded files

* perform lexical analysis including the study of frequency and distribution data; in particular

* export data into other tools (as XML, tab separated values, etc.)

It was developed by Stéfan Sinclair & Geoffrey Rockwell.

3. Using Voyant

3.1 A single tool on a single text

http://voyeurtools.org/tool/Cirrus/?corpus=1332195420983.7806&query=&stopList=stop.en.taporware.txt&docIndex=0&docId=d1332130132258.e484e611-10ab-7866-fba8-16d56a787810

3.2 Getting texts into Voyant

Go to: voyant-tools.org  (Backup: voyeur.hermeneuti.ca)

3.3 An overview of Voyant on a single text

http://www.gutenberg.org/cache/epub/370/pg370.txt

3.4 Comparing texts using Voyant

http://www.gutenberg.org/cache/epub/8409/pg8409.txt

http://www.gutenberg.org/cache/epub/6124/pg6124.txt

4.0 Other stuff you can do.

Old Bailey Online and Voyant Tools - http://www.oldbaileyonline.org/obapi/

And coming soon: TCP ECCO texts in Voyant Tools via 18thConnect - http://www.18thconnect.org/

Resources:

http://hermeneuti.ca/voyeur - Project webpage for Voyant. Lots of information on the methodologies that underlie Voyant, as well as documentation, workshops etc.

http://tada.mcmaster.ca/Main/WhatTA - ‘What is Text Analysis’ Lengthy essay by Geoffrey Rockwell

http://dirt.projectbamboo.org/categories/text-mining - Digital Research Tools wiki page that has a long list of tools for text analysis and text mining.

Gutenberg URLs:

Defoe, The Fortunes & Misfortunes of the Famous Moll Flanders &c.

http://www.gutenberg.org/cache/epub/370/pg370.txt

Behn, Love-Letters Between a Nobleman and His Sister

http://www.gutenberg.org/cache/epub/8409/pg8409.txt

Richardson, Pamela, or Virtue Rewarded

http://www.gutenberg.org/cache/epub/6124/pg6124.txt

Equiano, The Interesting Narrative of the Life of Olaudah Equiano http://www.gutenberg.org/cache/epub/15399/pg15399.txt