GLOBAL DIGITAL HUMANITIES
TEXT-MINING MULTILINGUAL LITERARY CORPORA
Computational Criticism has a language problem
Anglo-Centrism
Anglo-Centrism: Hathi Trust
Global Digital Humanities: Aligning Results
Alignment is NOT translation
Alignment is NOT translation
Global Collaborations
Over the past several years, the Literary Lab has joined with other Digital Humanities organizations to explore multi-lingual projects:
Multilingual Alignment: 3 Experiments
Dramatic Structure
1: Networks
Dramatic Structure
Centrality: Betweenness
Centrality: Betweenness
Amphitryon, Dryden (1690) – Betweenness Centrality
Amphitryon, Hawkesworth (1756) – Betweenness Centrality
Amphitryon, Kleist (1803) – Betweenness Centrality
Network Density of Amphitryon (Three Editions)
Topicity Between Languages
2: Topic Models
TWO TOPICS FROM A TOPIC MODEL OF �~200 WORKS OF SUSPENSE FICTION
Topic Model: The Ambassadors
Epistemological words
question
learning
understanding
secret
prove
Topic Model: The Ambassadors
Space/time words
reached
hotel
friend
evening
room
Topic Model: The Ambassadors
Topicity: Mono-Topical Paragraph
Topicity: Bi-Topical Paragraph
Topicity in Dickens vs Goethe
Topicity in Dickens vs Goethe: scaled
Topicity in Dickens vs Goethe
Dickens Mean Topicity: 2.55 (Scaled); 2.02 (Unscaled)
Goethe Mean Topicity: 1.08 (Scaled); 1.04 (Unscaled)
Confounding Effects:
A set of protocols or methods for aligning these results would allow us to interpret the difference.
History of Literature / Histoire de Littérature
3: Vector Models
CURRENT CORPUS (FRENCH)
CURRENT CORPUS (ENGLISH)
Corpus Statistics
Corpus Statistics
Corpus Statistics
CORPORA KNOWN BIASES
Global Vectors for Word Representation: GloVe
(Penning, Socher and Manning, 2014)
Log bi-linear model with a weighted least-squares objective
Less computationally expensive than the neural nets associated with word2vec but similar results at high token counts (n>5M)
Allows for distance scores via cosine similarity as well as vector math on results
Vector Models
All vectors calculated to 150 dimensions using 5 token skip-grams
Vector model uses shared contextual similarity to assign distance:
Two words ”close” to each other may never appear in the same sentence.
Synonym/antonym relationships are captured by the model
Vector Models
Closest_Words_English | En_Terms_Score | Closest_Words_French | Fr_Terms_Score |
literature | 1 | littérature | 1 |
modern | 0.752334704 | française | 0.8171132 |
poetry | 0.743134099 | poésie | 0.795609667 |
history | 0.740588306 | moderne | 0.763611367 |
english | 0.731722264 | contemporaine | 0.747738682 |
fiction | 0.722667968 | histoire | 0.742839262 |
art | 0.694622969 | littéraire | 0.727554381 |
literary | 0.67289793 | langue | 0.707563985 |
american | 0.657313228 | allemande | 0.690306253 |
german | 0.651461481 | philosophie | 0.689387309 |
science | 0.64739467 | anglaise | 0.679375838 |
especially | 0.644995867 | critique | 0.678499313 |
england | 0.63924782 | france | 0.6757804 |
french | 0.63682407 | époque | 0.651479153 |
european | 0.630711079 | dramatique | 0.650263723 |
writers | 0.626208597 | surtout | 0.643750204 |
philosophy | 0.6254931 | poétique | 0.64152213 |
studies | 0.614163419 | siècle | 0.637453337 |
century | 0.610898707 | art | 0.636201112 |
Terms closest to “literature/littérature”
Literature-Nationality | Lit-Nat_Score | Littérature-Nationalité | Litt-Nation_Score |
literature | 0.776436357 | littérature | 0.736426392 |
art | 0.63578764 | histoire | 0.601281709 |
fiction | 0.612817902 | poésie | 0.561197857 |
poetry | 0.609578411 | essais | 0.558929646 |
history | 0.606552679 | livre | 0.558598607 |
modern | 0.59033175 | ouvrages | 0.555483853 |
english | 0.586196852 | siècle | 0.553174961 |
especially | 0.578234135 | critique | 0.548915533 |
works | 0.575399373 | moderne | 0.547627737 |
books | 0.572133177 | surtout | 0.543034782 |
since | 0.561425571 | littéraire | 0.533972948 |
science | 0.557590999 | genre | 0.533355506 |
studies | 0.542464494 | art | 0.530875284 |
england | 0.542227282 | philosophie | 0.528667599 |
now | 0.532075515 | chapitre | 0.522507932 |
also | 0.529803747 | roman | 0.508692744 |
philosophy | 0.527389281 | française | 0.508401501 |
present | 0.523795619 | contemporaine | 0.507069903 |
writing | 0.519118682 | temps | 0.506582681 |
Terms closest to “literature/littérature – nationality/nationalité”
Terms closest to “literature”
Terms closest to “literature” (detail)
Terms closest to “littérature”
Terms closest to “littérature” (detail)
Nationality as a function of history
Nationalité as a function of history
Art vs Science (English)
Art vs Science (French)
Arts vs Sciences (French)
Art-Science vector model
Arts-Sciences vector model
Philsophie vs Philosophy
Decreasing terms of function (English)
Increasing literary/criticism terms (English)
Decreasing disciplinary terms (French)
Increasing professional terms (French)
Romantic-Classical vector model
Romantique-Classique vector model