Darwin’s Semantic Voyage

Exploration & Exploitation of Victorian Science in Darwin’s Reading Notebooks

Jaimie Murdock, Colin Allen, Simon DeDeo

Indiana University

Paper: http://arxiv.org/abs/1509.07175

Slides: http://jamr.am/CCS15/

Information Foraging

  • Scientific innovation occurs against cultural background of accumulating ideas
  • Individual scientist must balance exploration of novel ideas against exploitation of existing domains of expertise
  • Information foraging:
    • Has been studied at timescales of seconds to minutes at individual level in lab experiments
    • Has been studied at decadal timescales at collective level using, e.g., patent databases
    • Has not (heretofore) been studied in an individual at decadal timescale using “biographically plausible” datasets

Darwin’s Reading Notebooks

Records kept from spring 1838 until early 1860

  • 1,248 titles mentioned
  • 915 marked read
  • post-1840: 560 marked as scientific
  • includes 140 French, German & Latin

Locating and modeling full texts

  • 669/688 English-language, non-fiction texts located in HathiTrust, Internet Archive, or Project Gutenberg (97.2%)
  • Applied LDA topic modeling

Page 3a of Darwin's first notebook (DAR 119), during which he began to track the exact dates. Note the reading of Malthus's \emph{On Population} on October 3, 1838.

“Probabilistic Topic Models” from Blei (2012), CACM

4

5

Latent Dirichlet Allocation (LDA)

Generative Model

  • Choose θi ~ Dir(α) (i is doc)
  • Choose Φk ~ Dir(β) (k is topic)
  • For each word position
    • Choose a topic
      zi,j ~ Multinomial(θi)
    • Choose a word
      wi,j ~ Multinomial(Φz_i,j)


Training on a Corpus
Bayesian inference on
θ and Φ
Training on a new Document (d)
Fix P(w|z) to infer P(z|d)

Plate Notation for Smoothed LDA
α – Dirichlet prior for per-doc topic dist
β – Dirichlet prior for per-topic word dist
θi – topic distribution for doc i
Φj – word distribution for topic k
zi,j – topic for jth word in doc i
wi,j – the actual word

K=number of topics, M=number of documents, N=number of words in corpus

Kullback-Leibler Divergence

(for discrete distributions)

  • Measures “surprise” (in bits) of optimal learner trained on q upon encountering p
  • Has many conceptual distinct but consistent interpretations

  • Empirically validated in studies of cognitive load due to novelty (Hale 2001); attention attractors (Itti & Baldi 2009); language learning (Martin et al. 2013; Calamaro & Jarosz 2015); word selection (Resnik 1993; Light & Greiff 2002); syntactic processing and comprehension (Demberg & Keller 2008; Levy 2008)

not a metric -- fails symmetry and triangle inequality

Text-to-text and past-to-text surprise

  • Text-to-text surprise measures divergence of new book from previous book
  • Past-to-text surprise measures divergence of new book from all previous books

Interpretation

Exploration is marked by high surprise.

Exploitation is marked by low surprise.

Q1: Efficiency of reading path:

Are pathways surprise-reducing?

Greedy shortest path, ave 0.72 bits

Darwin t-t = 10.65 bits

Random t-t = 11.42 bits

(p << 10-3)

p-t no sig diff

Q2 and Q3: Cultural and Biographical

Building a Null Reading Model

  • Available(t) = items published before time t
  • Create a null permutation of the readings:
    • For each reading date t, select from Available(t),
      without replacement.
  • Null model at t is the mean KL
    of all null permutations at t.
  • Measures expected surprise,
    given changing cultural landscape.

cultural or publication?

Text-to-text surprise measures divergence of new book from previous book

Past-to-text surprise measures divergence of new book from all previous books

Q2: “The Interplay Between

Individual and Collective”

(Tria et al., 2014)

Individual consumption

Cultural production

Individual and
Collective

Darwin is more surprised than the culture, relative to publication-order null (population)

Null is top edge. Publication order null is dashed line. Darwin is solid

Individual and
Collective

Darwin is more surprised than the culture, relative to publication-order null

Key texts shown:

Malthus - On Population (1824)

Lyell - Principles of Geology, 3 ed. (1834)

Chambers - Vestiges of the
Natural History of Creation
(1844)

Key texts identified by D.

Q3: biographical

X-axis: from ordinal to temporal

X-axis: from ordinal to temporal

Interaction of
Text-to-Text &
Past-to-Text

Note that two lines

do not always correlate!

Interaction of
Text-to-Text &
Past-to-Text

Discovered via Bayesian

epoch estimation:

  • Bisect text-to-text

Find split ei that maximizes Gaussian MLE
for observed text-to-text data

  • Bisect past-to-text

Find split ei that maximizes Gaussian MLE
for observed past-to-text data

see SImon for details :-)

Interaction of
Text-to-Text &
Past-to-Text

Discovered via Bayesian

epoch estimation:

  • Bisect text-to-text

Find split ei that maximizes Gaussian MLE
for observed text-to-text data

  • Bisect past-to-text

Find split ei that maximizes Gaussian MLE
for observed past-to-text data

Independent choices!

see SImon for details :-)

Interaction of
Text-to-Text &
Past-to-Text

  • Bisect text-to-text

ei = 507; September 16, 1854

μ = 10.68 → 10.48; σ2 = 24.76 → 31.46

  • Bisect past-to-text

ei = 300; February 28, 1846

μ = 3.05 → 3.00; σ2 = 1.39 → 1.85

L(300, 507) = -2765.84; AIC = 5545.34
L(-) = -2775.34; AIC = 5558.68

relative likelihood of null = 0.03

see SImon for details :-)

Epoch I: 1837-1846

local and global

exploitation

Publication of 2 eds.

Voyage of the Beagle

Wrote first 2 drafts of

speciation theory

LvGv = focused revisiting

Epoch II: 1846-1854

local exploitation,

global exploration

Barnacles, barnacles, barnacles!

Exploration of zoology,

taxonomy, paleontology

LvG^ = focused search to expanding horizon

Epoch III: 1854-1860

local and global
exploration

Synthesis of the

Origin of Species

L^G^ = unfocused revisiting

Summary

Three main findings:

  • Darwin’s reading patterns switch on multiple timescales between exploration and exploitation, in contrast to an
    “efficient” surprise minimization strategy
  • Compared to the order in which texts were published,
    Darwin’s reading order shows higher cumulative surprise, indicating that society-at-large accumulates innovation more gradually than an individual consumes those same innovations
  • On the longest timescale, mode switching falls into three epochs that are biographically significant

Our Semantic Voyage

Additional Thanks

Robert Rose

+ Undergrad Assistants

  • Doori Lee
  • Tom Murphy

+ Funding support

  • IU Cognitive Science Program
  • IU OVPR
  • IU SoIC
  • Santa Fe Institute

+ Comments & Suggestions

  • Peter Todd

jammurdo@indiana.edu

http://arxiv.org/abs/1509.07175