1 of 18

Measuring Conversational Uptake

A Case Study on Student-Teacher Interactions�Demszky et al.

CS6742 | Presented by BW | May 7, 2024

2 of 18

Conversational Uptake

3 of 18

Dataset

Source

  • Transcripts of 45-60 minute long
  • 4th and 5th grade elementary math classroom observations
  • collected by the National Center for Teacher Effectiveness (NCTE)
  • between 2010-2013
  • 317 teachers across 4 school districts
  • in New England
  • Schools serving largely low-income, historically marginalized students
  • Anonymized

4 of 18

Dataset

A New Educational Uptake Dataset

  • a dataset of utterance pairs (S, T)
  • only keep utterance pairs where S contains at least 5 tokens
  • 55k (S, T) pairs, we sample 2246 for annotation
  • (S, T) must relate to math
  • Select among three labels: “low”, “mid” and “high”
  • Expert raters "whose demographics were representative of US K-12 teacher population"
  • Inter-rater agreement for uptake is Spearman ρ = .474

5 of 18

6 of 18

7 of 18

Similarity-based Uptake

Metric

Description

Remove Punctua-tion (♠)

Remove Stop-words (⊕)

Stem-ming (†)

LCS

Longest Common Subsequence.

%- IN - T

Fraction of tokens from S that are also in T

%- IN - S

Fraction of tokens from T that are also in S.

JACCARD

Jaccard similarity

BLEU

BLEU score for up to 4-grams.

GLOVE [ALIGNED]

Average pairwise cosine similarity of word embeddings between tokens from S and T.

GLOVE [UTT]

Cosine similarity of utterance vectors representing S and T.

SENTENCE-BERT

Cosine similarity of utterance vectors representing S and T, using Sentence-BERT.

UNIVERSAL SENTENCE ENCODER

Inner product of utterance vectors representing S and T, using Universal Sentence Encoder.

8 of 18

Formalize Dependence

pJSD

  • Formalizing Uptake as dependence of T on S
  • Captured by the Jensen-Shannon Divergence
  • Pointwise variant

9 of 18

10 of 18

Estimating pJSD

Fine Tuning with Loss as Next Utterance Classification

  • is the true teacher utterance,
  • is the preceding student utterance,
  • is a randomly sampled teacher utterance,
  • is the probability that is a true reply to predicted by the model parameterized by . In this paper this is a BERT model�

11 of 18

Additional Dataset

  • Switchboard (1997)
    • 2,400 two-sided telephone conversations among 543 speakers (302 male, 241 female) from all areas of the United States.
    • Identified five uptake phenomena labeled: acknowledgment, answer, collaborative completion, reformulation and repetition
  • 1-1 online tutoring dataset (2019)
    • on-demand text-based tutoring for math and science.
    • two outcome measures: (1) student satisfaction scores (1-5 scale) �(2) a rating by the tutor manager based on an evaluation rubric (0-1 scale).

12 of 18

Additional Dataset

  • SimTeacher dataset.
    • mixed reality simulation platform
    • novice teachers get to practice key classroom skills
    • The avatars are controlled remotely by a trained actor
    • Fall 2019, with 338 sessions representing 117 teachers
    • all sessions are based on the same scenario (discussed text, leading questions, avatar scripts)

13 of 18

14 of 18

15 of 18

16 of 18

17 of 18

18 of 18

Takeaway

  • (1) a dataset of student-teacher exchanges extracted from US math classroom transcripts annotated for uptake by experts
  • (2) formalizing uptake as pointwise Jensen-Shannon Divergence (PJSD), estimated via next utterance classification
  • (3) conducting a linguistically motivated comparison of different unsupervised measures
    • Measures that focused on word overlap perform well
    • pJSD performs the best, scoreing in a similar range as human agreement.
    • pJSD measure captures a broader range of uptake phenomena beyond mere repetition.
  • (4) these measures are shown to correlate well with educational outcomes rated by humans.