WordSeer is a research project at UC Berkeley's Computer Science Division and School of Information. It's a web-based text analysis and sensemaking environment for humanists and social scientists.

Let's unpack that:

WordSeer is a research project, funded by two successive NEH digital humanities grants. We're computer scientists, and our goal is to figure out how to to make advanced computational technologies from the fields of information retrieval, data visualization, and computational linguistics work for scholars trying to deeply understand text.

Web-based. WordSeer is a program that runs in a web browser. In our case, the only browsers in which it works properly are Chrome and Safari (Firefox is currently breaking for mysterious reasons. see "research project" above). WordSeer's main website is http://wordseer.berkeley.edu.

Text Analysis and Sensemaking. Sensemaking is a bit of jargon computer scientists use to describe the complex, drawn-out, iterative process we engage in when we're trying to process and understand information. All scholarly research is a form of sensemaking. We're particularly interested in sensemaking with textual data, because there aren't too many tools out there for this. Sensemaking with text is more difficult than with other kinds of data, because the only really good way to get meaning out of text is to read it. Tables of numbers on the other hand, don't need to be read in the same way to be meaningful. Numbers are therefore comparatively easier to condense, summarize, spot patterns with, and predict.

For humanists and social scientists. Literature scholars, historians, and many other kinds of humanist and social scientists (not to mention journalists and data analysts) need do text-based sensemaking very deeply every day. It's a hard problem that needs a good solution.

0 Loading up WordSeer and Signing In

If you're collaborating with us, you'll know your WordSeer URL -- it'll look like wordseer.berkeley.edu/<something>

Using Safari or Chrome, go to the URL and wait a few minutes (sorry, yes, actual minutes, we're working on this) for the sign-in page to load. It isn't working in Firefox right now.
If you've signed up already, enter your username and password
If you haven't signed up already, make a new username and password. Don't use an important password, use something simple, like 'test' -- we don't have any encryption yet.
Press go -- WordSeer should load.

When you open up WordSeer for the first time, it should show you something like this:

Figure 2: What you see when you open up WordSeer Shakespeare.

It may seem like a lot of information, but it breaks down into simple components. You're seeing a single panel with a

A Word Tree visualization of the most frequent word
For the slice: the entire collection

The next section explains these terms.

1 WordSeer Concepts

WordSeer has a few different types of components that fit together in different ways to help you analyze your texts.

Slice: A set of sentences to analyze

The most important idea in WordSeer is the slice. A slice is very simple, it's just a set of sentences you want to analyze. WordSeer can make completely arbitrary slices, but it's more useful to think of slices as the results of searches and filters.

Examples of slices are

The entire collection (no searches or filters)
The text of Julius Caesar (a filter for the play)
The set of sentences containing the word "good" ( a search for the word "good")
The set of sentences spoken by Gertrude in Act 2 Scene 2 of Hamlet (the result of applying filters for play, speaker, and scene)
The set of sentences containing "good" spoken by Juliet in Romeo and Juliet (a combination of a search for "good" and filters for play and speaker ).

WordSeer can also make slices through sentence sets, word sets and document sets (which allow you to combine items together from many different searches and filters), but that's a topic for later on.

A slice is an abstract concept. It's just a set of sentences you want to analyze. In WordSeer, data takes concrete visual representations through Panels that contain visualizations (this chapter) and overviews (this chapter).

For example, Figures 3 and 4 below show some panels showing the List of Search Results visualization and several overviews for some different slices:

Figure 3: A panel showing the List of Search Results visualization for the slice: sentences containing the word "good" in Act 5 of Hamlet.

Figure 4: A panel showing the List of Search Results visual for the slice: sentences spoken by Caesar in Julius Caesar.

Figure 5: A panel showing the Document Viewer for the slice: all sentences in Julius Caesar.

Panels: Represent the sentences in a slice

Panels are like windows on your desktop. They display information derived from a slice. I say "derived from" because it the information can be be something as simple as displaying a list of sentences in the slice, like Figures 3 and 4, displaying a document, like in Figure 5 Or, as complex as showing a word tree of all the sentences matching a given term, like in Figure 2.

Panels contain:

At least one visualization (some have two). A visualization is just some kind of representation of the data in your slice, such as a set of bar charts, a list of sentences, or a more fancy visualization.
Filters and overviews. These help you get a sense of the contents of your slice and drill down into finer slices.

For example, let's deconstruct a panel with a simple visualization, the List of Search Results on the slice: all the sentences in Hamlet. Figure 6(a) shows the panel, and Figure 6(b) deconstructs it.

Figure 6 (a) A panel showing the List of Search Results for the slice: all the sentences in Hamlet.

Figure 6(b) The components of the panel showing the List of Search Results for the slice: all sentences in Hamlet.

Multiple Panels: Side-by-side comparison

WordSeer supports opening many panels side by side (as many as you want, but after two or three, most screens get crowded).

Multiple panels allow you to make side-by-side comparisons of different slices and different visualizations.

For example, in Figure 6(c), I have two panels open, comparing Hamlet's speeches in Act 1 and Act 5. The lists of phrases, nouns, verbs, and adjectives in the two panels are different. Each panel shows the overviews for its slice. In the left hand panel, I have hamlet's speeches in Act 1, and and the right hand panel, I have hamlet's speeches in Act 5.

Figure 6(c): Side-by-side comparisons with multiple panels. The two panels compare Hamlet's speeches in Act 1 (left) to his speeches in Act 5 (right).

2 Overviews and Filters

Whether you're looking at the whole collection, or just a list of sentences in a smaller slice, overviews give you a sense of how the sentences distributed across several useful types of categories.

Overviews also double as filters, which allow you to select just the subset of sentences that match each category.

WordSeer has three types of filters/overviews:

Metadata categories
Language features
Frequent phrases
Frequent nouns, verb, and adjectives

Metadata Categories

These are meaningful categories created from annotations in the input.

For example, WordSeer's Shakespeare instance, uses the Internet Shakespeare Editions. These are XML files annotated with with Title, Act, Scene, Line and Speaker values for each speech, so WordSeer extracts these annotations and makes them available as filter categories.

Figure 7(a): Metadata categories for the slice: all sentences in Hamlet

Overviews double up as filters. For example, clicking on "Act 2, Scene 2" in Figure 7(a) above narrows the slice in the panel. It makes a new, more specific slice, i.e. all sentences in Act 2, scene 2 of Hamlet. The panel's metadata categories change to reflect the smaller slice, with the result shown below:

Figure 7(b): The resulting metadata categories after clicking on "Act 2, Scene 2" in 7(a). Clicking makes a new, more specific slice: Act2, Scene 2 of Hamlet. In this slice (which contains 300 sentences) the overview shows that Hamlet says 149 sentences, Polonius 70, the King 23, and so on.

Range-based categories

The above example deal with categorical metadata. But what data types that are more naturally expressed as continuous ranges, such as time? For these data types, the metadata pane shows a different type of overview-filter, a distribution chart.

In our Shakespeare collection, the only numerical data type we have is "line" for the line number within the scene. This is what the overview-filter looks like:

If I want to look at just lines in a particular range, I can drag the handles and click the "filter" button:

In other collections, these sliders might be used to select date ranges or other more meaningful spans.

Language Features

WordSeer gives you a a rough overview of the general complexity of the sentences in the slice by computing the sentence length (number of words) and average word length (average number of characters per word) for each sentence.

These are hidden by default because not every scholar is interested in them, but you can see them by clicking the "Language Features" checkbox:

Frequent Phrases

In WordSeer, "phrases" are sequences of two or more words. Every panel in WordSeer shows you an overview of the most frequent phrases in the slice. For example, if we zoom in on Figure 6 -- the panel showing the list of sentences in Hamlet, we see the most frequent phrases in Hamlet.

Figure 8(a) The most frequent 2-word phrases in Hamlet that don't contain stop words.

There are two options:

Include stop words (false by default).

"Stop words" are extremely common non-content words, such as "the" "a" "an", etc. Figure 8 (a) shows the list without phrases containing stop words. Clicking the checkbox produces Figure 8(b) below which shows phrases that do contain stop words.

Length: (2 by default).

This drop-down menu controls how long the phrases in the list are. WordSeer counts phrases upto length 4.

Figure 8(b): Clicking the checkbox in Figure 8(a) produces this list, the most frequent 2-word phrases, including phrases that contain stop words.

This frequent phrases overview doubles as a filter. For example, if you want to see all 13 occurrences of "good night" in Hamlet, you can click the table row for "good night", producing Figure 9.

Figure 9: Adding a filter for the phrase "good night" to the slice: all sentences in Hamlet creates a new, smaller slice of just 13 sentences. All the other filters in the panel change to reflect the new slice.

Frequent Nouns, Verbs, and Adjectives

WordSeer uses a computational linguistics technology called part-of-speech tagging to automatically categorize words into their parts of speech.

As part of the overview of a slice, WordSeer calculates the most frequent nouns, verbs, and adjectives in the slice and displays them in a list. Figure 10 zooms in on the lists in Figure 6. The lists show the most frequent nouns, verbs, and adjectives for all sentences in Hamlet.

Figure 10: The overview lists of the most frequent nouns, verbs, and adjectives in Hamlet.

There is an option: "group by stem". A stem is a common root from which different word forms are derived. For example, enabling this option would group together "read", "reading", and "reads" under the single label "read", and show the added-up count for all of them.

Just like the list of phrases, these word lists double as filters. For example, clicking on the word "lord" in the Nouns list above in Figure 10 would filter the "Hamlet" slice to just the sentences containing "lord" in "Hamlet".

3 Searches

In the previous section, we saw how filters can be used to select sentences for analysis. In this section, we'll look at how we can select sentences with WordSeer's search capabilities.

Figure 11(a): WordSeer's search box. Pressing "Go" will show a list of all the sentences that contain the word "heaven".

WordSeer has a pretty complicated search box, but for simple searches, you can ignore most of it.

Keyword Search

WordSeer has two different kinds of search modes: keyword and grammatical. Keyword search works by matching keywords in the text. You type in words or phrases, and get sentences that match those words or phrases.

To perform a keyword search in WordSeer, type words into the search box, and leave the grammatical relation set to anywhere in the text.

There are many different ways of doing keyword searches:

A single word

e.g. heaven

Sentences that match the word "heaven"

A space-separated words or quote-encosed words

e.g. in heaven or "in heaven"

Sentences that match the exact phrase "in heaven"

A comma-separated words

e.g. hell, heaven, earth

Sentences that match any of the words "hell", "heaven", or "earth"

Adding a wildcard:

e.g. confid* sunshine

Sentences that contain words starting with "confid", e.g. "confident", "confidence", "confide" OR sunshine.

The wildcard matches a string of characters. You can mix and match wildcards with other search terms.

Boolean search -- ands and ors: The full syntax for WordSeer's boolean search is explained here

The + operator is AND -- it means that the word must appear in the sentence.

+heaven +hell
Sentences that contain both "heaven" and "hell"

Without a + operator, a word is an OR -- it is optional.
Mixing and matching. Suppose I want to match "God" AND either heaven or hell:

+god +(heaven hell)
+god means that it must appear, and the + in front of the parentheses means that whatever is inside it must also be matched. Since this is "heaven" and "hell" without any +'s, it's either heaven or hell.

Grammatical Search

WordSeer's other search mode, grammatical search, is more complex. It allows you to search over grammatical relationships between words. These relationships are things like "verb subject", "verb object", "adjective modifer", etc. Grammatical search allows you to ask questions like, "what are all the adjectives that apply to the word 'man'", and "what are all the verbs that 'Hamlet' is the agent of"?

The full list of grammatical relationships supported by WordSeer is described in detail here, in the Stanford Dependencies Manual. It explains all the different kinds of relationships available in WordSeer and gives examples of them in sentences.

Grammatical searches are more complex because there are three pieces of information in a grammatical relationship.

The type of relationship
the first word in the relationship
and the second word in the relationship.

Why aren't 2 and 3 interchangeable? Consider the two sentences "Look at the poster display, it's really nice", and "Look at the display poster, it's really nice". In both cases, there's a noun compound relationship between "poster" and "display". However different word orders give the compounds slightly different meanings. In the "poster display" is a display of posters which is really nice, in the second "display poster" is a poster for display, and the poster is really nice. Computational linguistics technology represents the two relationships as noun_compound(display, poster) and noun_compound(poster, display).

Performing a Grammatical Search

You can activate grammatical search mode using the drop-down menu in the top search bar. Selecting any relationship other than "anywhere in the text" will perform a grammatical search with that relationship. Another search box will appear to the right of the relations menu, so you can specify both words.

Figure 11(b): The grammatical relations menu and the second search box.

Just like keyword search, there are a few ways to do a grammatical search. Here _____ means that you leave the box blank

______ <relation> <query 2>

e.g. ________ done to "hamlet"
This will return all the sentences containing verbs in which "hamlet" is the subject

<query 1> <relation> ______

e.g. "heaven" described as ___________
This will return all the sentences containing adjectives that are applied to "heaven"

<query 1> <relation> <query 2>

e.g. "heaven" described as "good"
This will return all the sentences in which the adjective "good" is applied to "heaven"

comma-separated lists of words:

e.g. "heaven" described as "good, great"
This will return all the sentences in which the adjective "good" or the adjective "great" is applied to "heaven".

Viewing the results of a grammatical search

The Grammatical Search Bar Charts visualization was developed specifically for grammatical search queries. It's like List of Search Results except augmented with bar charts of how many words match the grammatical relationship. Below, Figure 11(c) shows how this visual can, be used to investigate descriptions of facial attributes.

Searching for the "face, eyes, hair [described as] _______" with the Grammatical Search Bar Charts visualization.

Figure 11(c) The results of using the Grammatical Search Bar Charts for the query "face, eyes, hair" described as _____".

The bar charts show how often the each of the words appear in a "described as" relationship, as well as the words that describe them. In 11(c) for "face,eyes,hair described as ______" the chart shows that "eyes" is the most commonly described feature, at 83 times.

The list of matching sentences is shown below the chart, with the matching words highlighted. The charts are also interactive. Clicking on a word filters the list of sentences to match that word, as shown below in 11(d).

Figure 11(d): I can filter the list of sentences to just the adjectives "sweet", "heavenly" and "fair" by clicking on chart on the left hand side. I click on the bars for "sweet", "heavenly" and "fair" (dark blue), which filters the list of sentences to only those in which "face", "eyes", or "hair" are described as "sweet", "heavenly" or "fair".

3 Visualizations

List of Search Results

The list of search results is the simplest visualization WordSeer offers. It's exactly what it sounds like: a list of sentences that matches the results of a slice.

You can either search for a term using the search box, or open up a new blank panel using the "All Sentences" button:

Here is the list of search results for "speaker: Ophelia" "act: 3":

You can sort by line and scene, and filter by any of the words, phrases or metadata categories (collapsed on the side).

Clicking on any of these sentences opens up the sentence in the document viewer:

Clicking on a sentence in the sentence list opens it up in the document viewer and highlights it.

Documents with Matches

This is another simple visualization that shows documents that correspond to a slice. You can open it up from the top bar by clicking "All Documents"

Or search directly using the search box.

For example, here are all the documents in which "rome" is mentioned, sorted by how many times they're mentioned.

Double clicking on a document in this listing opens it up in the Document Viewer

The Document Viewer

This "visualization" shows the contents of a document. Like the other visualizations, it responds to filters.

For example, here is the text of Romeo and Juliet, filtered to romeo's lines:

Word Trees

The Word Tree visualization was developed in 2008 by Martin Wattenberg and Fernanda Viegas. It allows for quick overviews and exploration of the contexts in which a word occurs.

You can make a new word tree panel by clicking on the Word Tree button in the top bar:

But if you already have a word in mind, you can create the word tree for it directly from the search bar:

Word Trees WIthout a Search Term

If you don't specify a search term WordSeer will just use the most frequent word in the collection (or slice, if you add filters).

In Shakespeare, this is what you get if you open up a word tree with no search -- a Word Tree of the word "good" (which is the most frequent content word in the collection (excluding stop words such as "the" and "a")).

Figure 12: A word tree with no search terms. In the shakespeare collection, this produces a tree of the word "good".

Clicking on a branch in the word tree expands it to just that context. The contexts are arranged in order from most to least frequent.

Figure 12(b) Clicking on the branch "the" filters the tree to just the sentences containing "the good". More context is now visible. You can now see that "the good duke" and "the good queen" "the good gods" are common constructions.

The branches in grey are individual sentences. Hovering over a sentence branch shows a popup containing more information about the sentence, and the option to open up the document viewer for the play at that sentence:

Figure 12(c) Hovering over an individual sentence (grey text) shows a popup with the full sentence and any associated metadata. Here, we see that this particular sentence is from "The Life of King Henry the Eighth Act 1, Scene 1, line 51" and is said by Norfolk.

Adding filters will change the visualization to reflect the most frequent word in the filtered set.

For example, if we filter to just the speaker "Romeo",

we'll get a word tree of the most frequent word said by Romeo, which, appropriately enough, happens to be "love":

Figure 12(c): Word Trees --- just like overviews -- respond to filters. When we filter to "speaker: Romeo", we get a word tree of the most frequent word in those sentences, which happens to be "love".

Word Trees with a Search Term

If we specify a search term, the Word Tree's word is fixed to that term, even if it isn't the most frequent one in the slice.

For example, Figure 12(c) shows that the most frequent word in Romeo's speeches is "love" -- but what if we want to investigate "love" in other plays?

We can start by typing in "love" in the search bar and selecting the "Word Tree" visualization:

This fixes "love" as the center term in the Word Tree:

Figure 12(d) The word tree for "love" across all of Shakespeare's plays.

We can now apply other filters. For example, 12(c) shows that "love" occurs 130 times in "Two Gentlement of Veronal". To investigate, further we can click to filter to just that play, and explore the tree. Figure 12(e) below shows the tree for "love" in the play, with the "in love" branch expanded.

Figure 12(e): The word tree for "love" in "Two Gentlemen of Verona", with the "in love" branch currently expanded to show all 18 sentences.

Making More Word Trees with the Word Menu

If, while exploring a word tree branch, you want to make a new word tree centered around that branch, use the Word Menu.

Word Frequencies

Word Frequency charts are great for comparing the frequencies of words across different categories. With them, you can answer questions like, "How does Gertrude's involvement in the events of Hamlet change over the course of the play"? And, which are the characters, plays, and scenes that mention "love" the most?

You can make a new word frequency graph panel by clicking on the top menu bar:

Or, if you have a search in mind, by typing it in to the search bar:

Word Frequency Charts without a Search Term

Sometimes, we're interested in how a particular category relates to other categories.Word Frequency graphs can help investigate such relationships. For example, how are lines by Gertrude (one category of sentences) spread out over the different acts of Hamlet (another category of sentences)?

To examine Gertrude's involvement in the play, we make a new Word Frequencies graph and filter it to just "speaker: Gertrude". Her pattern of involvement is immediately clear. It rises in Acts 1 and 2, peaks in Act 3, and falls in Act 4 and 5.

Figure 13(a) Frequency graphs for Gertrude's lines across different metadata categories.

Scrolling down, we can see that her activity is concentrated in Act 3, Scene 4, and Act 4, Scenes 5 and 7.

The "Normalize" option converts between showing raw counts, and showing percentages. For example, above, We see that 2% of the lines in Act 3, Scene 4 are by Gertrude.

The same chart without normalization looks slightly different, because the scenes are all of are of different lengths. Act 4 scene 7 is a shorter scene, so even though there are just 4 sentences, Gertrude is proportionally more involved than in Act 5, Scene 1 and Act 5 Scene 2, which have 7 counts. Sometimes Normalization is the right choice, but other times, it may not make sense.

Taking advantage of Filters

Word Frequency Graphs are interactive. Clicking on a bar, or selecting a range (in the case of continuous values) filters the other graphs to show only matches in that range.

For example, if we wanted to repeat the analysis of Gertrude's involvement in the play for every character in Hamlet, we could take advantage of this filtering function.

First, we could create a word frequency plot for play:"Hamlet" and hide all but the "speaker" and "act_title" graphs:

Then, by clicking the speakers one by one, we see their involvement reflected in the top graph, for example, clicking on "Horatio" reveals his pattern:

Figure 13(b) Horatio's involvement in the play.

Figure 13(c) Filtering works the other way too, for example, in Act 4, the King and Polonius have many lines.

Word Frequency Graphs with Search Terms

Word Frequency graphs also do what they're named for: show word frequencies across categories. You can search for multiple words, and either stack the charts or group them:

For example, Figure 13(d) shows the incidence of the death-related words, "dead" "kill" and "die" across all of Shakespeare's plays, and over Act. Perhaps unsurprisingly, the later in the plays we progress, the more frequent these words become.

Figure 13(d) The incidence of the words "dead" and "kill" and "die" across all of Shakespeare's plays, and split by Act. Perhaps unsurprisingly, the later in the plays we progress, the more frequent these two death-related words become.

Figure 13(e): FIltering to just "Romeo and Juliet" and un-clicking the "Stacked" checkbox reveals the pattern for just that play.

4 Exploring and Navigating with the Word Menu

One of the most powerful ways to get around WordSeer is the Word Menu. When you see something interesting in a visualization, the word menu often gives you a way to follow up on that thought by creating a new visualization, adding something to a group, or exploring related ideas.

Figure 14(a) The word menu for "father"

Navigating a Grammatical Neighborhood with Search Options

The search options in the word menu allow you to click on a word and search for it in a new visualization. More importantly though, they show you the different relations in which that word appears.

Figure 14(b): The Word Menu for "father" showing the different ways in which "father" is used, and the number of times each one appears in the collection. In this example, we can see the predominant ways in which father is described by examining the "adjectival modifier" search option. Fathers in shakespeare are "good", "dear", "noble", "ghostly", "royal" and "sweet".

Clicking on any of these options does a grammatical search for that relationship.

Related Words

WordSeer also calculates co-occurrence relationships between words, which you can access through the "Related Words" option:

This option pops up two different windows, one of words that occur nearby, and another of words that occur in similar contexts.

Words that occur Nearby

The "Nearby Words" option displays words that occur in the same sentences as the clicked-on word. These words are sensitive to the slice:

For example, if we search for "father" and look at the nearby words, those are filtered to the words that occur near "son" in sentences that contain "father".

Clicking any word in this list brings up the option to see where the words co-occur. Clicking on the "show co-occurrences" button above would bring up a List of Search Results pane showing sentences where "son", and "daughter" co-occur.

Words in Similar Contexts

If your collection is small enough, WordSeer also computed words that occur in similar contexts. When we click Related Words for "son" in Shakespeare, we get the following popup in addition to the nearby words:

Getting to the word menu

Words appear in a lot of different places in WordSeer -- lists of frequent words, lists of nearby words, in document views, in sentence popups, and in the list of sentences. If a word turns blue when you hover over it, clicking or right-clicking on it will make a word menu.

The only exceptions are the word trees and the overviews, in which clicks have a different meaning. The lists of most frequent nouns, verbs, and adjectives in a slice double as filters: clicking a word filters the slice to just the sentences containing that word. In the word tree, clicking a branch filters the word tree to just sentences matching that branch. In both these places, to make a word menu, you have to right-click instead.

6 Creating Your Own Filters and Groups with Sets

Units of Analysis

When we put a collection into WordSeer, it automatically extracts any annotations in the text and turns them into metadata filters. While these are useful, they're not always enough. Sometimes, we want to define our own units of analysis.

For example, consider the question, "How does the treatment of love in Shakespeare vary between the comedies and tragedies"? Here, "comedies" and "tragedies" are units of analysis that don't come pre-defined.

Another example of such a question is "What are are the different characteristics of speeches by male and female speakers?" Here, our units of analysis are "speeches by male speakers" and "speeches by female speakers" -- we don't have those as pre-defined categories either.

Yet another is, "How do concepts of emotion correlate with mentions of people in power -- how often do emotions like "anger", "sadness", "joy", "hate" correlate with different kinds of people in power? This is more complex. We want to look at "the sentences mentioning different types of people in power" and correlate them with "sentences mentioning different types of emotion".

In this section, I'll show how WordSeer's Document Sets, Sentence Sets, and Word Sets features can help conduct exactly these types of analyses.

Sets aren't just for comparison -- once you make a set, it persists, you don't lose it. You can use them to collect interesting things to look at, or to make conceptual groupings for your own understanding.

Sets of Documents

Document sets are most useful when you're looking at gathering certain types of documents together. You make document sets by searching and filtering in the Documents with Matches view. To make a document set, just select some documents, and click "Add to Group".

For example, If you wanted to follow up on the question of "How does the treatment of love vary between the comedies and tragedies", we could do that in the following way. First, collect all the comedies:

Then, add them to a group by clicking the "Add to group" button:

Name the new group "comedies". This creates a new group: "comedies". We can do the same for Tragedies, and the Document Sets overview now shows two sets, "comedies" and "tragedies"

We can now use these sets as filters, because they appear in the metadata overview:

So, now if we wanted examine the treatment of "love" across the two sets of documents, we could do a word frequencies comparison. The word frequencies chart automatically uses the new document set categories.

Figure 15(a) Comparison of the normalized word frequencies of "in love" over the document sets "comedies" and "tragedies". We see that about 0.4% of the sentences in the comedies mention "in love", whereas less than half that, around 0.1% of the sentences in the tragedies do the same.

Sentence Sets

You can put sentences into sets from the List of Search Results and Grammatical Search Bar Charts views. Click the checkboxes next to the sentences you want, and then add them to the set.

For example, let's collect speeches by female speakers in "The Merchant of Venice" into a sentence set. First, narrow down the list of sentences to just that play.

Use the auto-suggest box to quickly select to "Merchant of Venice".

Here, I've just finished adding the 240 sentences spoken by Portia to the set, and I'm about to add 50 by Nerissa:

After adding all the women's sentences, I get 338 sentences. After doing the same for the men, and ignoring characters with less than 5 sentences, I get

Now I can begin comparing them. I open up two panes, and look at the word frequencies across the acts for the two sets:

The lists of frequent words in the two panels are all slightly different from each other, and the characters' patterns of involvement in the play are also very different.

Word Sets

Word Sets are just collections of words. When they act as filters, though, they match all the sentences that contain a word.

Word sets can be used as search terms in the search box. Instead of typing in a long list of words, you can just use the word set instead.

The search will match any of the words in the set.

There are two ways to make and edit word sets in WordSeer, through the Word Menu, and through the Word Sets panel.

Word Sets with the Word Menu

As explained in the Word Menu section, clicking or right-clicking on a word almost anywhere in WordSeer opens up the Word Menu.

The word menu has options to either add the word to a word set (you can add it to a new one if you don't have existing ones) and to edit existing word sets:

If you make a new set, it'll automatically be named after the word, and you can add more words to it using the word menu:

Here, I'm adding "king" to the "lord" word set.

If you want to add words directly, you can use the "Edit word set" option:

This brings up the word set in a free-floating window:

Editing Word Sets

Double click anywhere in the window to start editing. I'm going to add some more royal words.

Press "OK" to save, or "Cancel" to discard your edits.

I'm also going to rename this set to "royals". You can do this by clicking on the title:

Word Sets with the Word Sets Overview

You can also make and manage your Word Sets with the "Word Sets" overview:

Clicking "New" creates a new set, and "Delete" deletes the selected set. Double click to open the word set up in a window, or to rename it. Here, I'm creating a new set and naming it "god/supernatural":

Right click or double click on an entry to rename it. Double clicking opens it up in a window, for editing.

In this set, I put "god, almighty, heaven, and spirit".

I can now compare how some emotion-related words co-occur with the two categories

For this, I simply do two searches in the word frequency graph and compare the split across categories. As expected, the comedies have more happiness and the tragedies have more anger, but the comparison between royals and supernaturals is interesting:

It appears, in fact that the "royals" words are much less associated with the happy search (orange) than the "god/supernatural" words. Only 0.61% of the royal sentences have "happy" words, whereas twice as many (proportionally speaking) of the "god/supernatural" sentences have "happy" words.

7 Saving Your Work

You can save and export everything you see in WordSeer into images and data files for use in other places. This includes all the graphs, visualizations, tables of data, and lists of words and sentences.

Exporting data from tables

Every single WordSeer table has a tiny "Save" button on its top left corner:

Saving images

For image-based visualizations, such as the Grammatical Search Bar Charts, Word Trees, and Word Frequencies, click on the save button at the top of the panel to generate download links to each of the visualizations as an image.

If you want to save the image in a filtered state, just click the save button after performing your operations -- the images generated always reflect the current state of the chart.

History

WordSeer saves your history onto your computer, so don't worry if you accidentally close the page. When you open it up again, your history will be available to you from the pane on the left. Just click a row to open up the panel again. Your history won't be available if you use a different computer or a different username though.