Irina Rish
Canada Excellence Research Chair in Autonomous AI
University of Montreal & Mila
AI 4 Psychology and Psychology 4 AI Towards Better Alignment Among Humans and Machines
“Psychiatric research is in crisis” � - Wiecki, Poland, Frank, 2015���
“Imagine going to a doctor because of chest pain that has been bothering you for a couple of weeks. The doctor would sit down with you, listen carefully to your description of symptoms, and prescribe medication to lower blood pressure in case you have a heart condition. After a couple of weeks, your pain has not subsided. The doctor now prescribes medication against reflux, which finally seems to help. In this scenario, not a single medical analysis (e.g., electrocardiogram, blood work, or a gastroscopy) was performed, and medication with potentially severe side effects was prescribed on a trial-and-error basis.
…This scenario resembles much of contemporary psychiatry
diagnosis and treatment.”
Psychiatry lacks objective clinical tests routinely used in other medical fields!
Goal: augment current approaches to diagnosis and treatment of mental disorders with objective measurements and AI
Multi-modal data:
COMPUTATIONAL PSYCHIATRY?
AI 4 NEURO: �NEUROIMAGING DATA ANALYSIS �
[Cecchi et al, NIPS 2009]
[Rish et al, PLOS One, 2013
[Gheiratmand et al, Nature PJ Schizophrenia 2017]
[Carroll et al, Neuroimage 2009]
[Scheinberg&Rish, ECML 2010]
Schizophrenia classification: 74% to 93% accuracy
symptom severity prediction
[Rish et al, Brain Informatics 2010]
[Rish et al, SPIE Med.Imaging 2012]
[Cecchi et al, PLOS Comp Bio 2012]
Mental states in videogames: sparse regression, 70-95%
Pain perception: sparse regression,
70-80% accuracy, “holographic” patterns
[Honorio et al, AISTATS 2012]
[Rish et al, SPIE Med.Imaging 2016]
Cocaine addiction: sparse Markov net biomarkers;
MPH effects analysis (“stimulant 4 stimulant”)
[Bashivan et al, ICLR 2016]
Cognitive load prediction: 91% w/ recurrent ConvNets
“Statistical biomarkers”:
+
+
+
+
-
-
-
-
-
Predictive Model
Mental disorder
healthy
[Abrevaya et al, 2018]
Nonlinear dynamical models of CaI and fMRI
Beyond the Scanner: Using ‘Cheaper’ Sensors?
NeuroSky
Muse EEG
EEG, accelerometer
Hexoskin
Heart rate
Respiration
Heart-rate
variability
Jawbone UP3
Heart rate
Respiration
Galvanic Skin Response (GSR)
Skin temperature
Ambient Temperature
Accelerometer
Can we detect mental states using wearables?
Other cheap sensors: speech, transcribed text, video, etc.
QUIZ: CAN YOU DIAGNOSE?
The dream, I was on my way there, I tripped on a fence and fell into the water; I was struggling, then I could get out, I could get out. I got out by myself.
I had to, to, I dreamed about my neighbor, I looked in the box, I told you, didn’t I? I went, I went there, when I remembered, I saw it was not at home. It was because I was at home, there came a neighbor, we started talking, I started talking I said I'm out of time, I'll have to wash the house. Then when I said “make a point" I thought: oh my God, I'm not at home, no.
With my evangelic daughter. She is crying.
No, I only saw Jesus. She sometimes appears to me laughing, or sometimes she appears crying.
SPEECH GRAPHS
Mota, Natalia, et al. Speech graphs provide a quantitative measure of thought disorder in psychosis.
PLoS One 2012; 7
Speech graph:
I
walked
place
found
grandma
hugged
strongly
woke up
Transcribed speech (description of a recent dream):
I walked into a place, and I found my
grandma. I hugged her strongly. I woke up.
SPEECH GRAPHS
Image courtesy of Mota, Natalia, et al.
Speech graphs provide a quantitative
measure of thought disorder in psychosis.
PLoS One 2012; 7
COMPUTING COHERENCE
Pipeline for automated extraction of the semantic coherence features.
I cannot think of them all offhand. They were the ones I always considered my best songs. They were the ones I really wrote from experience.
[I cannot think of them all offhand.]1 [They were the ones I always considered my best songs.]2 [They were the ones I really wrote from experience.]3
[ ]1
[ ]2
[ ]3
…
[ ]1
[ ]2
[ ]3
RESULTS�
Bedi, Gillinder et al. "Automated analysis of free speech predicts
psychosis onset in high-risk youths." npj Schizophrenia 1 (2015): 15030
Text features are changing noticeably between controls and schizophrenic subjects
100% accurate classification achieved using these features, for predicting 1st psychotic episode 1-2 years in advance based on text of interview with the patients
Subject who did not develop psychosis - blue
Subjects who developed psychosis - red
OTHER EXAMPLES
�AI FOR
PSYCHOTHERAPY?
S. Garg et al, Infogain-Driven Dialogue Modeling via Hash Functions (submitted)
Iirina
Rish
S. Garg @USC
Guillermo
Cecchi
13
Slide credit: Sahil Garg
14
Slide credit: Sahil Garg
Outperforms deep net systems:
where deep nets failed
construct hash codes of responses
optimize hashing model to maximize
mutual information between patient
and therapist
learn a predictive model to infer
therapist’s response to patient
Patient
Therapist
INFOGAIN-DRIVEN DIALOGUE VIA HASHCODE REPRESENTATIONS
16
Slide credit: Sahil Garg
17
Slide credit: Sahil Garg
18
Slide credit: Sahil Garg
FUTURE: VIRTUAL THERAPIST?
A virtual AI assistant on a smartphone which implements the following four main steps: (1) data collection; (2) mental state recognition; (3) taking action to improve the mental state; (4) receiving feedback from a person to improve future actions
Sensor data:
Text, Audio
Video
EEG signal
Temperature
Heart-rate
AI Algorithms: classification
of mental states, detection of emotional and cognitive changes
Decision-
Making:
choosing best feedback or another action (call a friend? Tell a joke? Send a reminder?)
Take an action
Obtain feedback from a person
Users
Roles:
24/7 personal coach, assistant, therapist, caretaker, or just a “digital friend”
FUTURE RESEARCH
The Holy Grail of AI: Generalization
AGI ⇔ “General” AI ⇔ Multi-task,“Broad” AI
“Highly autonomous systems that outperform humans at most economically valuable work” (OpenAI definition)
“Cambrian Explosion” of Large-Scale Models
Foundation Models: Jump Towards AGI?
“Train one model on a huge amount of data and adapt it to many applications.
We call such a model a foundation model.”
CEFM: Stanford’s Center for Research on Foundation Models
“On the Opportunities and Risks of Foundation Models”
Application example: healthcare
Scaling Laws as “Investment Tools”
An example:
image transformers dominated by convnets in lower data regimes, but outperforming the latter with more data: https://arxiv.org/pdf/2010.11929.pdf
Brief History of Neural Scaling Laws
Kaplan et al, Scaling Laws for Neural Language Models, 2020
Cortes et al. Learning curves: Asymptotic values and rate of convergence. NeurIPS 1994.
First to observe power law scaling: of ANNs:
x = dataset size and y = test error.
Hestness et al. Deep Learning Scaling is Predictable,Empirically. Dec 2017.
1994
2017
Showed that data-size dependent scaling laws given by power laws hold over many orders of magnitude.
Rosenfeld et al. . A constructive prediction of the generalization error across scales. 2019.
Applied power laws to model-size dependent scaling laws, i.e. when x = number of parameters.
Showed that power law applies when x = compute, besides x = data and x = model.
This paper brought “neural” scaling laws to the mainstream as it was in context of GPT-3 training.
2019
2020
Neural Scaling Laws: Kaplan et al
Jared Kaplan et al, Scaling Laws for Neural Language Models, 2020.
Scale and Inductive Biases
More Complex Scaling Behavior:
“Phase Transitions”, Emergent Phenomena
-
Broken Neural Scaling Laws:
A Universal Functional Form for Neural Scaling Laws?
Ethan Caballero et al, 2022
https://arxiv.org/abs/2210.14891
BNSL accurately fits and extrapolates a very wide range of scaling behaviors
Training Foundation Models
“We think the most benefits will go to whoever has the biggest computer.” � Greg Brockman, OpenAI’s CTO, Financial Times
Most compute is owned by AI companies (Google, OpenAI, etc), not academia & nonprofit research; this “compute gap” continues to widen.
We need to “democratize AI”!
INCITE award to Train Open Foundation Models
5.9M V100 GPU hrs on Summit
Supercomputers: Summit and Frontier
Growing International Collaboration
nolano.org
Farama
Ongoing Projects
Language Models: Pretraining and Continual Learning
Aligned Multimodal Language-Vision Models:
Time-series Transformers
Multimodal “Generalist” Agent
Ultimate goal:
Interactive, Continually Learning “Open ChatX” model
Should Pretraining be Continual?
Standard pre-training:
multiple datasets available at once; mixed into one dataset
(or, sampled uniformly into each minibatch)
Example: A Generalist Agent
Aligning Vision-Language Models
???
Aligning Multimodal Models with Human Values
JC Layoun, A Roger, I Rish. Aligning MAGMA by Few-Shot Learning and Finetuning. Montreal AI Symp 2022, arXiv:2210.14161
A Roger, E Aïmeur, I Rish, Towards ethical multimodal systems, arXiv:2304.13765. presentation
We evaluate “commonsense morality” (Hendryks et al., 2020) either (1) manually or (2) using RoBERTa- large common sense classifier trained only on the text of the Ethics database by Hendrycks et al. (2020). However, the latter tends to be less reliable than the former.
Promising preliminary result: fine-tuning on just 30 hand-made “good” samples, only for 4 epochs, improves the morality score by 10%.
Ongoing work:
Project Direction 1: AI for Psychology
Project Direction 2: Psychology for AI
Apply existing psycho-evaluation tests (e.g., PsychoBench and others) to evaluate the output of an LLM or a vision-text (e.g. Robin) model.
What Variables Affecting the Output Can Vary
What Metrics Can We Evaluate
Thank you!