1 of 30

Language Modeling is fundamental to NLP

BERT

GPT-2

RoBERTa

T5

…

Models

Language Model

I love to go ___

hiking

LM Pretraining

…

me gustaría ir de excursión

Translation

Sentiment

Assistants

Target Tasks

2 of 30

Ecological Fallacy

Individual observations part

of a group treated

as independent.

Robinson, 1950 (American Sociological Association)

3 of 30

Motivation: Ecological Fallacy in Language Modeling

I spend my weekends hiking.

I love the serenity of the mountains.

I could watch anime all day…!!

Hiking is the best

Yeah, right -_-

Did you watch Haikyuu!!

Input Text Sequences.

4 of 30

Motivation: Ecological Fallacy in Language Modeling

I spend my weekends hiking.

I love the serenity of the mountains.

I could watch anime all day…!!

Hiking is the best

Yeah, right -_-

Did you watch Haikyuu!!

Computes loss on independent text sequences.

5 of 30

Motivation: Ecological Fallacy in Language Modeling

Text sequences written by the same author (part of a group).

I spend my weekends hiking.

I love the serenity of the mountains.

I could watch anime all day…!!

Hiking is the best

Yeah, right -_-

Did you watch Haikyuu!!

Computes loss on independent text sequences.

6 of 30

Motivation: Ecological Fallacy in Language Modeling

Large Human Language Models

Oral Session on June 18 9:00-10:30am (Queued in the end)

7 of 30

Human Language Modeling (HuLM): User State

(Washington Outsider, 2014)

Human states are somewhat stable but also change over time.

However, human states are not entirely static. While they are somewhat stable they also change over time.

<click>

To account for this, we condition on a dynamic user state

<click>

Modeled as a latent variable capturing the distribution of human states over time

as expressed through their language

(specifically via a temporally ordered sequence of their previous utterances.)

**************** old

The states can be seen as an implicit sequence of temporally ordered variables

<click>

Which are derived from a the temporally ordered utterances from the same user.

<click>

This allows the modeling of the somewhat stable and somewhat changing nature of the states of a human over time.

The state itself is derived from a temporally ordered sequence of utterances from the same individual.

This allows us to model the user state as latent variable

<click> that captures a distribution of the somewhat stable and changing states of a user over time.

If-time: (for such their emotional state changes) [seems abstract]

8 of 30

Human Language Modeling (HuLM): User State

Commitment. Maybe anxious about new beginnings.

Carefree. Living in the moment.

9 of 30

Human Language Modeling (HuLM): User State

Condition on a dynamic user state

(Washington Outsider, 2014)

Human states are somewhat stable but also change over time.

Latent variable capturing the distribution of human states over time through the user’s language

Soni et al., 2022

However, human states are not entirely static. While they are somewhat stable they also change over time.

<click>

To account for this, we condition on a dynamic user state

<click>

Modeled as a latent variable capturing the distribution of human states over time

as expressed through their language

(specifically via a temporally ordered sequence of their previous utterances.)

**************** old

The states can be seen as an implicit sequence of temporally ordered variables

<click>

Which are derived from a the temporally ordered utterances from the same user.

<click>

This allows the modeling of the somewhat stable and somewhat changing nature of the states of a human over time.

The state itself is derived from a temporally ordered sequence of utterances from the same individual.

This allows us to model the user state as latent variable

<click> that captures a distribution of the somewhat stable and changing states of a user over time.

If-time: (for such their emotional state changes) [seems abstract]

10 of 30

Human Language Modeling (HuLM): Problem Definition

HuLM Paper Code

Soni et al., 2022

11 of 30

HaRT: Human-aware Recurrent Transformers

Transformer

Layer 12

Layer 11

Layer 2

Layer 1

Layer 3

Insert

Layer

U_i-1

Previous User State

...

Input User Messages

Q = W^T_QU [H⁽¹⁾;U_i-1]

User-State Based Self-Attention

U_i = tanh(W_U U_i-1 + W_HH⁽¹¹⁾)

User State Recurrence

Extract

Layer

Temporally ordered

Transformer

U_i

Next User State

Soni et al., 2022

To address HuLM, we introduce HaRT,

<click>

which is an auto-regressive transformer

<click> with a recurrent user state.

<click>

HaRT consists of regular transformer layers and one user-state informed self-attention layer.

<click> [remove all the blue layers]

More specifically, the user state is used to inform the query computation in this layer.

<click>[bring up the recurrent equation ]

The user state is updated recurrently, using the output from a later layer.

<click>[bring up the input block]

in order to process all utterances? from a user, we order the messages by created time and chunk them into blocks of 1024 tokens.

<click> [show the probability equation. ]

At any given point in the user message sequence, HaRT can be thus seen as modeling the conditional probability of the message given all the previous messages of the user.

12 of 30

Soni et al., 2022

13 of 30

Soni et al., 2022

14 of 30

Soni et al., 2022

Personality

Stance Detection

15 of 30

Colab

Perplexity

Human Language Modeling with HaRT
Language Modeling with GPT2-HLC

bit.ly/text2hulm

16 of 30

Selected Further Reading

Personalized Language Models

LMs with User Embeddings (Li et al., 2015; Benton et al., 2016; Wu et al., 2020)
Continued training on user language (Wen et al., 2013; King and Cook, 2020)
LMs trained with latent author representation (Delasalles et al., 2019)
LLM trained with author representation from historical language (Soni et al., 2022)

Personalized Application-Focused, and Debiasing Models

User specific feature vectors (Jaech and Ostendorf, 2018; Seyler et al., 2020)
Prefixed static or learnt user identifiers (Zhong et al., 2021; Li et al., 2021; Mireshghallah et al., 2022)
Hierarchical modeling of user’s historical text (Lynn et al., 2020; Matero et al., 2021)
Eliminate word vector spaces associated with particular biases such as gender (Bolukbasi et al., 2016; Wang et al., 2020; Ravfogel et al., 2020), religion (Liang et al., 2020).

Large Human Language Models: A Need and the Challenges

17 of 30

Human Context for/in Dialog Agents

18 of 30

What is Human Context for Dialog Systems?

Personality

I am going to ? .

Modes of communication

Occupation

Demographics

Large Human Language Models

Soni et al., 2024

19 of 30

Human-Level Agent Modeling

PARRY (Kenneth Colby 1972)

20 of 30

Human-Level Agent Modeling

PARRY (Kenneth Colby 1972)

Speaker - Adresse Model (Jiwei Li et al., 2017)

21 of 30

Human-Level Agent Modeling

PARRY (Kenneth Colby 1972)

Speaker - Adresse Model (Li et al., 2017)

PersonalityChat (Lotfi, et al. 2024)

22 of 30

Dialog Agents Understanding the Human

You Impress Me: Dialogue Generation via Mutual Persona Perception, Liu, et al, 2020, ACL

23 of 30

Psychological Metrics

Human-Centered Metrics for Dialog System Evaluation, Giorgi et al., arXiv 2023

Agents

Dialogues

Turns

Dialog System

24 of 30

Psychological Metrics

25 of 30

Psychological Metrics

26 of 30

Selected Further Readings

Artificial paranoia: A computer simulation of paranoid processes, K. M. Colby, 1972
Assigning Personality/Identity to a Chatting Machine for Coherent Conversation Generation, Q. Qian, M. Huang, H. Zhao, J. Xu, X. Zhu, 2017
A Persona-based Neural Conversation Model, J. Li, M. Galley, C. Brockett, G. P. Spithourakis, J. Gao, B. Dolan, 2017
PersonalityChat: Conversation Distillation for Personalized Dialog Modeling with Facts and Traits, E. Lotfi, M. De Bruyn, J. Buhmann, W. Daelemans, 2024
You Impress Me: Dialogue Generation via Mutual Persona Perception, Q. Liu, Y. Chen, B. Chen, J.-G. Lou, Z. Chen, B. Zhou, D. Zhang, 2020
The Power of Personalization: A Systematic Review of Personality-Adaptive Chatbots, T. Ait Baha, et al., 2023

27 of 30

Colab

Generation w/ and w/o human context

GPT-4

https://bit.ly/text2agents

28 of 30

Ethical Considerations

Responsible release strategy.

Careful with profiling and stereotyping.

Unintended harms.

Malicious exploitations, and targeted content without consent of users.

Laws and policies for user privacy and data consent.

More representationally diverse, covering a wider world population.

29 of 30

Motivation: Ecological Fallacy in Language Modeling

Text sequences written by the many authors.

Large Human Language Models

30 of 30

Motivation: Ecological Fallacy in Language Modeling

Universal Author?

Large Human Language Models