Digital Textual Analysis (Voyant, etc.)
Andrew Longman’s Session
Opening: Explanation of Basic Interest
Question to all: Why are we all here? Why do we want to know about these tools?
A little experience with the tools, but some clarity needed for interpretation.
Questions about mechanics: how to load texts to Voyant--large and small corpi (sp?)
Initial test text for session was Moby Dick. Used to demonstrate basic features of the interface.
Using smaller sections of a work for comparision
Distant reading assignments (and what exactly is distant reading?)
Interesting questions about copyright and how that affects potential for use.
- Doesn’t violate copyright to upload a text for analysis -- the violation is sharing the complete text as a document itself
Other tools mentioned:
Sporkforge - www.spokforge.com;
Many Eyes - http://www-958.ibm.com/software/analytics/manyeyes/
Tapor - http://portal.tapor.ca/portal/portal
Google N-Gram - http://books.google.com/ngrams
Juxta - http://www.juxtasoftware.org/
(session proposals included a more detailed look at Juxta);
TAPAS: http://www.tapasproject.org/ (TEI publishing)
Tapas is the TEI Archiving Publishing and Access Service for scholars and other creators of TEI data who need a place to publish their materials in different forms and ensure it remains accessible over time. Tapas is also for anyone interested in reading and exploring TEI data, and communicating with those that share that interest.
SEASR is designed to enable digital humanities developers to rapidly design, build, and share software applications that support research and collaboration.
AntConc: A concordance tool: http://www.antlab.sci.waseda.ac.jp/software.html
morphadorner.northwestern.edu: java command-line program which acts as a pipeline manager for processes performing morphological adornement of words in texts.
What’s so interesting about these digital tools for students?
- Shows them the different tools for analysis itself - that texts are not just self-obvious and coherent entities that cannot be unpacked.
- Show changes in canonicity - nobody talking about Melville, then everyone talking about Melville
- Text mining books ABOUT literary -- what books are named as “the american poet” or the major names, and who isn’t named
- Use these tools to trouble the nature of publication itself
- Students see books/texts as stable products -- books were published as one entity and never changed.
- Most texts have lots of versions -- deciding on the authoritative version of a text is a complicated decision and process.
- Shows books as complicated entities with a life history
- Literature has a politics and history to it -- these literary tools make those questions and ideas more visible to them.
- Translation has much potential here as well for comparison
- Compare one text with others --- can show how a text is quite different from other contemporaneous texts, or how it is using similar words / concepts
- Creative Writing & Literature -- a student can put in their own work into these tools - to see trends within the context
- How do you talk about the difference between “close reading” and “distance reading”
- How to teachers talk to their students about these tools-how are they different?
- When using tools like this, how do you read for silences? How convey the importance of infrequency?
- Tools - won’t help with PLOT
- Be explicit -- don’t assume you’re going to be able to institute tools without talking about the assumptions and abilities of the tools themselves.
- Assignment: early english literature. In the Survey of Brit Lit -- only time for 4 or 5 Shakespeare sonnets. But you could compare what a tool analysis shows of a group of sonnets vs. close reading of a single sonnet?
- What do we do as literary scholars? We create models of texts. A model of reading and understanding of texts.
- Is this tool, or might it be, used as compensation for a lack of breadth?
- Voyant can also point students to larger, more specific text-analytic tools.
- Examples: mappingtexts.stanford.edu; Mining the Dispatch: http://dsl.richmond.edu/dispatch/
- How useful is Voyant when applied to stylistics, or stylistic questions? The tool seems to largely be more thematic.
- Some discussion of the collaborative potential of Voyant tools.
- Since tools like Voyant analyse marked data, how approach errors in data recording? How to talk to students about this? One suggestion is that the level of errors is small enough to not subvert larger trend.
Pedagogy - how do you approach this in class?
- If using these tools, how do you explain/cite these tools in scholarly work?
- Are their journals that are publishing this type of scholarship (DH Quarterly, etc.) Are traditional journals accepting of this work? (A little too early to determine.)
- What type of research uses such tools or methodology? (Ryan Cordell discussed “Reprint Culture” & Celestial Railroad project, which applies a very specific research question to large datasets.)
- A stop list is an easy tool, but how expand the idea to exclude/filter things such as advertisements from newspaper data.
- OCR tools: Omnipage, Acrobat Pro, ABBYY Finereader
- Discussion of the importance of collaboration with someone in Computer Science; project must be appealing to them if the actual computation required is pedestrian (for example, Natural Language Processing). Also, how do you find these people for collaboration.
Sorry for the listy nature of this--it was the best I could do!