1 of 24

A Computational Approach to the Cultural Evolution of Cognitive Metaphors in Historical Texts (1517-1716)

(Computational Humanities Research 2023, Paris)

Vojtěch Kaše & Petr Pavlas

(kase@flu.cas.cz)

2 of 24

About TOME

  • a cultural evolutionary approach to history of metaphors 1517-1716
  • metaphors of knowledge in conjunction with the emergence of early Modern encyclopedism
  • combining intellectual history (close reading), computational text analysis (distant reading) and cultural evolution (theoretical models - esp. cultural attraction theory)
  • CTA goal: automatic detection, classification, and analysis of metaphors anchored in distributional semantics approaches to semantic change
  • Pilot study: Diachronic word embeddings for Noscemus

3 of 24

Pilot Study: Materials

4 of 24

Pilot Study: Semantic change in Noscemus

5 of 24

Noscemus overview

994 works, 106,535,061 tokens, distributed over 9(8) overlapping Discipline/Content categories

6 of 24

7 of 24

Preprocessing & Preliminary analyses

  • all preprocessing & analyses in Python
  • LatinCy: NLP pipeline for Latin - SpaCy (tokenization, POS-tagging, lemmatization, NER)
    • BUT: far from perfect, trained mainly on classical Latin, OCR

8 of 24

Pilot Study: Methods

9 of 24

DSM and semantic change

Hamilton, W. L., Leskovec, J., & Jurafsky, D. (2016). Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change. In ArXiv [cs.CL]. http://arxiv.org/abs/1605.09096

10 of 24

Vectors training

  • FastText model with the same parametrization as Sprugnoli et al. 2020.
    • vector size 100
    • window 10
    • negatives sampled 25
    • number of iterations 15
  • Sprugnoli, R., Moretti, G., & Passarotti, M. (2020). Building and Comparing Lemma Embeddings for Latin. Classical Latin versus Thomas Aquinas. Italian Journal of Computational Linguistics, 6(1). https://doi.org/10.5281/ZENODO.4618000
  • Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135–146. https://doi.org/10.1162/tacl_a_00051
  • cf. Ehrmanntraut, A., Hagen, T., Konle, L., & Jannidis, F. (2021). Type- and Token-based Word Embeddings in the Digital Humanities. CHR 2021: Computational Humanities Research 2021, 2989, 23.

11 of 24

Subcorpora vocabulary

  • corpus filtered only for lemmata of nouns, verbs, adjectives and proper names
  • 2000 most frequent words from each subcorpus (5,468 words in total)
  • adding 3,950 words used for training the LASLA and Opera Maiora vectors
  • filtering for words appearing in each subcorpus at least 5 times (=6,332)

12 of 24

Pilot Study: Results

13 of 24

Comparing Nearest Neighbors: “equus”

14 of 24

Comparing Nearest Neighbors: “scientia”

15 of 24

16 of 24

17 of 24

Co-ocurrences via PPMI2

example: “scientia”

18 of 24

Next steps

  • Turn to contextual token-based embeddings and metaphor detection algorithms based on BERT (latin BERT?)
  • Capture changes in multi-word phrases
  • Create our own additional corpora for works of other genres, first alchemy, later theosophy, even later learned magic…
  • Closely tracing the history of usage of certain words and “branches” of meaning (phylogenetic analysis?)

19 of 24

Thank you for your attention!

Vojtěch Kaše & Petr Pavlas

(kase@flu.cas.cz)

20 of 24

Back-up slides

21 of 24

Corpus Corporum

7,819 works extracted, 470M words

706 works & 62M words from early modern period (1501-1800)

22 of 24

Metaphors of Knowledge

Heuristic phase

What metaphors do we look for? Metaphors of knowledge.

                  • What is knowledge? In terms of a goal, it is a (both creative and adequate) representation of phenomena, facts, objects, processes, and events. In terms of means, it consists of invention (data, hypotheses, induction, abduction, inspiration, etc.), justification (theories, axioms, postulates, logical and mathematical rules, methods), preservation, transmission and transformation in space and time (archives, libraries, storages, textbooks, media in general). Metaphors enter the invention, preservation, transmission and transformation of knowledge, while being undesirable in the representation and justification of knowledge.

23 of 24

Metaphor identification procedure (MIP)

Decision Making (an adaptation of the MIP by the Pragglejaz Group, 2007)

  • 1. Determine the relevant syntagms which you look for in the corpus: what linguistic forms are they represented by?
  • 2. For each item found
      • 2.1 establish its meaning in context (contextual meaning).
      • 2.2 determine if it has a more basic contemporary meaning in other contexts than the one in the given context. For our purposes, basic meanings tend to be

- More concrete; what they evoke is easier to imagine, see, hear, feel, smell, and taste.

- Related to bodily action.

- More precise (as opposed to vague).

- Historically older.

      • 2.3 If yes, decide whether the contextual meaning contrasts with the basic meaning but can be understood in comparison with it.
  • 3. If yes, mark the syntagm as an instance of the cognitive metaphor in question.

24 of 24

Methodology

  • Hans Blumenberg and the school of Metaphorologie (metaphorology)
  • George Lakoff, Mark Johnson, and the school of Conceptual/Cognitive

Metaphor Theory (CMT)

  • Cultural evolution theory, mainly Dan Sperber, Olivier Morin, and the Paris school of Cultural Attraction Theory (CAT)
  • Computational/Digital humanities (DH)
  • Corpus-Assisted Discourse Study (CADS) by close and distant reading
  • Early modern intellectual historiography (traditional)
  • Historical epistemology, specifically history of modern encyclopaedism