1 of 210

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding

Subba Reddy Oota1, Manish Gupta2,3, Raju S. Bapi2, Mariya Toneva4

1Inria Bordeaux, France; 2IIIT Hyderabad, India; 3Microsoft, India; 4MPI for Software Systems, Germany

subba-reddy.oota@inria.fr, gmanish@microsoft.com, raju.bapi@iiit.ac.in, mtoneva@mpi-sws.org

2 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

2

3 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

3

4 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
    • Brain Encoding/Decoding: Techniques and Research Goals
    • Introduction to popular datasets
      • Text, Visual, Audio, Multi-modal
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

4

5 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
    • Brain Encoding/Decoding: Techniques and Research Goals
    • Introduction to popular datasets
      • Text, Visual, Audio, Multi-modal
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

5

6 of 210

Neuroscience

  • Field of science that studies the structure and function of the nervous system of different species.
  • Involves answering interesting questions
    • How learning occurs during adolescence, and how it differs from the way adults learn and form memories.
    • Which specific cells in the brain (and what connections they form with other cells), have a role in how memories are formed.
    • How animals cancel out irrelevant information arriving from the senses and focus only on information that matters.
    • How do humans make decisions.
    • How humans develop speech and learn languages.
  • Neuroscientists study diverse topics that help us understand how the brain and nervous system work.

IJCAI 2023: DL for Brain Encoding and Decoding

6

7 of 210

Brain encoding and decoding in cognitive neuroscience

  •  

IJCAI 2023: DL for Brain Encoding and Decoding

7

8 of 210

Brain encoding and decoding

  •  

IJCAI 2023: DL for Brain Encoding and Decoding

8

9 of 210

Techniques for studying the brain function

  • fMRI: high spatial but low time resolution.
    • Good to study a specific location in the brain
    • Unsuitable for sentence-level analysis. fMRI takes about two seconds to complete a scan. This is far lower than the speed at which humans can process language.
    • Cannot capture syntactic information (Gauthier and Levy, 2019)
  • EEG: high time but low spatial resolution.
    • Can preserve rich syntactic information (Hale et al., 2018)
    • But cannot use for source analysis.
  • fNIRS: compromise option
    • Time resolution better than fMRI
    • Spatial resolution better than EEG
    • Balance of spatial and temporal resolution may not be enough to compensate for the loss in both.

IJCAI 2023: DL for Brain Encoding and Decoding

9

Single Micro-Electrode (ME), Micro-Electrode array (MEA), Electro-Cortico Graphy (ECoG), Positron emission tomography (PET), functional MRI (fMRI), Magneto-encephalography (MEG), Electro-encephalography (EEG), Near-Infrared Spectroscopy (NIRS)

10 of 210

fMRI

  • No injections, surgery, the ingestion of substances, or exposure to ionizing radiation.
  • The primary form of fMRI uses the blood-oxygen-level dependent (BOLD) contrast, discovered by Seiji Ogawa in 1990.
    • Measures brain activity by detecting changes associated with blood flow.
    • When an area of the brain is in use, blood flow to that region also increases.
  • Hemodynamic response (HRF)
    • It takes a while for the vascular system to respond to the brain's need for glucose.
    • Blood flow lags the neuronal events triggering it by about 5 seconds.

IJCAI 2023: DL for Brain Encoding and Decoding

10

An fMRI image with yellow areas showing increased activity compared with a control condition

11 of 210

Computational Cognitive Science Research goals

  • Predictive Accuracy
    • Compare feature sets: Which feature set provides the most faithful reflection of the neural representational space?
    • Test feature decodability: “Does neural data Y contain information about features X?”
    • Build accurate models of brain data: Aim is to enable simulations of neuroscience experiments.
  • Interpretability
    • Examine individual features: Which features contribute the most to neural activity?
    • Test correspondences between representational spaces
      • “CNNs vs ventral visual stream” or “Two text representations”
    • Interpret feature sets
      • Do features X, generated by a known process, accurately describe the space of neural responses Y?
      • Do voxels respond to a single feature or exhibit mixed selectivity?
    • How does the mapping relate to other models or theories of brain function?

IJCAI 2023: DL for Brain Encoding and Decoding

11

12 of 210

Computational Cognitive Science Research goals

  • Biological plausibility
    • Simulate linear readout
      • If the features can be extracted with a linear mapping model, it means that they require few additional computations in order to be used downstream.
    • Incorporate measurement-related considerations
      • Rather than assuming a fixed HRF across voxels and/or conditions, what are better ways?

IJCAI 2023: DL for Brain Encoding and Decoding

12

13 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
    • Introduction to Brain Encoding/Decoding and applications
    • Introduction to popular datasets
      • Text, Visual, Audio, Multi-modal
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

13

14 of 210

Types of stimuli and popular datasets

  • Text (Words, Sentences, Paragraphs): Harry Potter Story, ZUCO EEG, Question-Answering MEG.
  • Visual: Binary visual patterns, Natural Images (Vim-1), BOLD5000, Algonauts and SS-fMRI.
  • Audio: Alice’s Adventures in Wonderland, Narratives, The Moth Radio Hour, Audio stories.
  • Videos: BBC’s Doctor Who, Japanese Ads, Pippi Langkous, Algonauts.
  • Other Multimodal Stimuli: Words + line drawing of concept named by each word, Pereira.

IJCAI 2023: DL for Brain Encoding and Decoding

14

15 of 210

Forms of stimulus presentation and data collection

  • Type: fMRI, EEG, MEG, …
  • TR: Sampling time.
  • Fixation points: location, color, shape.
  • Form of stimuli presentation: text, video, audio, images.
  • Task: question answering, property generation, understanding, …
  • Time given to participants: 1 minute to list properties, …
  • Type of participants: males/females, sighted/blind, …
  • Number of times the response to stimuli was recorded.
  • Language

IJCAI 2023: DL for Brain Encoding and Decoding

15

16 of 210

Text Stimulus Datasets

IJCAI 2023: DL for Brain Encoding and Decoding

16

Dataset

Type

Language

Stimulus

#Subjects

Paradigm

Size

Task

Wehbe et al., 2014

fMRI

English

Chapter 9 of Harry Potter and the Sorcerer's Stone

9

Reading stories

5000 word chapter was presented in 45 minutes.

Story understanding

Handjaras et al., 2016

fMRI

Italian

Verbal, pictorial or auditory presentation of 40 concrete nouns

20

Reading, viewing or listening

40 nouns * 4 times.

Property Generation

Anderson et al., 2017

fMRI

Italian

70 concrete and abstract nouns from law/music.

7

Reading

70 nouns * 5 times.

Imagine a situation that they personally associate with the noun

Zurich Cognitive Language Processing Corpus (ZuCo): Hollenstein et al., 2018

EEG and eye-tracking

English

Sentences from movie reviews or Wikipedia

12

Reading natural sentences

21,629 words in 1107 sentences and 154,173 fixations

Rate movie quality, answer control questions, check for existence of a relation

Anderson et al., 2019

fMRI

English

240 active voice sentences describing everyday situations

14

Reading

240 sentences seen 12 times (by 10 subjects) and 6 times (by 4 subjects)

Passive reading

BCCWJ-EEG: Oseki and Asahara, 2020

EEG

Japanese

20 newspaper articles

40

Reading

1 time reading for ~30-40 minutes

Passive reading

Deniz et al., 2019

fMRI

English

Subset of Moth Radio Hour. 11 stories

9

Reading

11 10- to 15 min stories presented twice word by word

Passive reading and Listening

17 of 210

Data for concrete nouns from sighted/blind subjects

  • Participants were asked to verbally enumerate in one minute the properties (features) that describe the entities the words refer to.
  • 4 groups of participants
    • 5 sighted individuals were presented with a pictorial form of the nouns
    • 5 sighted individuals with a verbal visual (i.e., written Italian words) form
    • 5 sighted individuals with a verbal auditory (i.e., spoken Italian words) form
    • 5 congenitally blind with a verbal auditory form.

IJCAI 2023: DL for Brain Encoding and Decoding

17

18 of 210

70 - Italian word stimuli fMRI data

  • Taxonomic categories in law and music domain
    • Ur-abstract: that are classified as abstract in WordNet
    • Attribute: A construct whereby objects or individuals can be distinguished
    • Communication: Something that is communicated by, to or between groups
    • Event/action: Something that happens at a given place and time
    • Person/Social role: Individual, someone, somebody, mortal
    • Location: Points or extents in space
    • Object/Tool: A class of unambiguously concrete nouns

IJCAI 2023: DL for Brain Encoding and Decoding

18

19 of 210

Zurich Cognitive Language Processing Corpus (ZuCo)

  • Personal reading speed.
    • Sentences were presented to the subjects in a naturalistic reading scenario
    • Complete sentence is presented on the screen
    • Subjects read each sentence at their own speed, i.e., the reader determines for how long each word is fixated and which word to fixate next.

IJCAI 2023: DL for Brain Encoding and Decoding

19

20 of 210

Visual Stimulus Datasets

IJCAI 2023: DL for Brain Encoding and Decoding

20

Dataset

Type

Stimulus

#S

Paradigm

Size

Task

Thirion et al., 2006

fMRI

Rotating wedges, expanding/contracting rings, rotating Gabor filters, grid

9

Viewing visual patterns

Wedges/rings for 8 times, 36 Gabor filters for 4 times, grid 36 times

Passive viewing, imagine one of the 6 domino stimuli when prompted to.

Vim-1: Kay et al., 2008

fMRI

Sequences of natural photos

2

Viewing natural images

Each subject viewed 1750 (Stage 1)+ 120 (Stage 2) novel natural images

Passive viewing

Horikawa et al., 2017

fMRI

Object images

5

Viewing and Reading

Each subject: (1) Image presentation: 1,200 images from 150 object categories and 50 images from 50 object categories; (2) Imagery: 10 times.

One-back repetition detection task, imagine object images pertaining to the category

BOLD5000: Chang et al., 2019

fMRI

5254 images depicting real-world scenes

4

Viewing natural images

∼20 hours of MRI scans per each of four participants

Passive viewing

Algonauts: Cichy et al., 2019

fMRI (EVC and IT)/MEG (early and late in time)

Object images

15

Viewing object images

92 silhouette object images and 118 images of objects on natural background

Passive viewing

Natural Scenes Dataset: Allen et al., 2022

fMRI

73000 natural scenes

8

Viewing natural scenes

~73000 distinct natural scene images from MSCOCO.

Passive viewing

THINGS: Hebart et al., 2023

fMRI/EEG

31188 natural images across 1,854 object concepts.

8

Viewing natural images

fMRI: 3 Participants. 8,740 unique images. 720 objects. MEG: 4 Participants. 22,448 unique images. 1,854 objects

oddball detection task (synthetic image).

21 of 210

Visual Binary Patterns

  1. Retinotopic mapping experiment: flickering rotating wedges and expanding/contracting rings.
  2. Domino experiment: groups of quickly rotating Gabor filters in an event-related design. Disks appeared simultaneously on the left and right side of the visual field.
  3. 6 different patterns in each hemifield.
  4. Subject was presented with the same grid. When the central fixation cross (left) became a left arrow (middle) or a right arrow (right), the subject had to imagine one of the 6 patterns presented previously, either in the left or right hemifield.

IJCAI 2023: DL for Brain Encoding and Decoding

21

22 of 210

Seen and imagined objects

  • Two fMRI experiments: An image presentation experiment, and an imagery experiment.
  • Image presentation experiment
    • Subjects performed a one-back repetition detection task on the images, responding with a button press for each repetition.
  • Imagery experiment
    • Cue stimuli composed of an array of object names were visually presented.
    • The onset and the end of the imagery periods were signalled by auditory beeps.
    • After the first beep, the subjects were instructed to imagine as many object images as possible pertaining to the category indicated by red letters.
    • They continued imagining with their eyes closed (15 s) until the second beep.
    • Subjects were then instructed to evaluate the vividness of their mental imagery (3 s).

IJCAI 2023: DL for Brain Encoding and Decoding

22

23 of 210

BOLD5000

  • ∼20 hours of MRI scans per each of the four participants.
  • 4,916 unique images were used as stimuli from 3 image sources

IJCAI 2023: DL for Brain Encoding and Decoding

23

24 of 210

Algonauts

IJCAI 2023: DL for Brain Encoding and Decoding

24

Training and Testing Material.

  1. There are two sets of training data, each consisting of an image set and brain activity in RDM format (for fMRI and MEG). Training set 1 has 92 silhouette object images, and training set 2 has 118 object images with natural backgrounds.
  2. Testing data consists of 78 images of objects on natural backgrounds.

25 of 210

Audio Stimulus Datasets

IJCAI 2023: DL for Brain Encoding and Decoding

25

Dataset

Type

Language

Stimulus

#S

Paradigm

Size

Task

Handjaras et al., 2016

fMRI

Italian

Verbal, pictorial or auditory presentation of 40 concrete nouns

20

Reading, viewing or listening

40 nouns * 4 times.

Property Generation

Huth et al., 2016

fMRI

English

Eleven 10-minute stories

7

Listening

2 hours of stories from The Moth Radio Hour

Passive Listening

Brennan and Hale, 2019

EEG

English

Chapter one of Alice’s Adventures in Wonderland as read by Kristen McQuillan

33

Listening

2,129 words in 84 sentences. The entire experimental session lasted 1–1.5 h (including QA).

8 MCQ Question answering concerning the contents of the story

Anderson et al., 2020

fMRI

English

One of 20 scenario names

26

Listening scenario name

20 scenario prompts displayed 5 times.

Imagine themselves personally experiencing common scenarios

Narratives: Nastase et al., 2021

fMRI

English

27 diverse naturalistic spoken stories

345

Listening

891 functional scans, totaling ~4.6 hours of unique stimuli (~43,000 words)

Passive Listening

Natural Stories: Zhang et al., 2020

fMRI

English

Moth-Radio-Hour naturalistic spoken stories

19

Listening

5 h 33 m (repeated twice). Each story is 6 m 48 s avg or 2492 words.

Passive Listening

The Little Prince: Li et al., 2021

fMRI

English, Chinese, French

Audiobook

112

Listening

English audiobook is 94 minutes long. Chinese: 99min. French: 97 min.

Passive Listening. 4 quiz questions.

MEG-MASC: Gwilliams et al., 2022

MEG

English

4 English fictional stories: Cable spool boy, LW1, Black willow, Easy money.

27

Listening

Two hours of naturalistic stories. 208 MEG sensors.

Passive Listening

26 of 210

Imagining common scenarios

  • Participants underwent fMRI as they reimagined the scenarios when prompted by standardized cues.
  • 20 Scenarios: resting, reading, writing, bathing, cooking, housework, exercising, internet, telephoning, driving, shopping, movie, museum, restaurant, barbecue, party, dancing, wedding, funeral, festival.
  • 20 attributes: bright, color, motion, touch, audition, music, speech, taste, head, upperlimb, lowerlimb, body, path, landmark, time, social, communication, cognition, pleasant, unpleasant.

IJCAI 2023: DL for Brain Encoding and Decoding

26

27 of 210

Narratives

IJCAI 2023: DL for Brain Encoding and Decoding

27

28 of 210

Video Stimulus Datasets

IJCAI 2023: DL for Brain Encoding and Decoding

28

Dataset

Type

Language

Stimulus

#Subjects

Paradigm

Size

Task

BBC’s Doctor Who: Seeliger et al., 2019

fMRI

English

Spatiotemporal visual and auditory naturalistic stimuli (30 episodes of BBC’s Doctor Who)

1

Viewing episode videos

120.830 whole-brain volumes (approx. 23 h) of single-presentation data, and 1.178 volumes (11 min) of repeated narrative short episodes (22 repetitions)

Passive viewing

Japanese Ads: Nishida et al., 2020

fMRI

Japanese

368 web and 2452 TV Japanese ad movies (15-30s)

40 and 28 for web and TV ads. 16 were overlapped

Viewing Ads

7200 train and 1200 test fMRIs for web; fMRIs from 420 ads.

Passive viewing

Pippi Langkous: Berezutskaya et al., 2020

ECoG

The movie was originally in Swedish but dubbed in Dutch

30 s excerpts of a feature film (in total, 6.5 min long), edited together for a coherent story

37 patients

Viewing

6.5 min movie.

Passive viewing

Algonauts: Cichy et al., 2021

fMRI

English

1000 short video clips

10

Viewing video clips

1000 short video clips (3 sec each)

Passive viewing

Natural Short Clips: Huth et al., 2022

fMRI

English

Natural short movie clips

5

Watching natural short movie clips

3870 responses per subject.

Passive viewing

29 of 210

Japanese Ads

  • Two sets of movies were provided by NTT DATA Corp: web and TV ads.
  • Four types of cognitive labels associated with the movie datasets
    • Scene descriptions
      • Human judges create scene descriptions with 50+ words per 1s scene.
    • Impression ratings
      • Human rating on 30 factors for every 2s clip on a scale of 0-4.
    • Ad effectiveness indices
      • Click rate: fraction of viewers who clicked the frame of a movie and jumped to a linked web page
      • View completion rate: fraction of viewers who continued to watch an ad movie until the end without choosing a skip option.
    • Ad preference votes
      • Each tester was asked to freely recall a small number of favorite TV ads from among the ads recently broadcasted.
      • The total number of recalls of an ad was regarded as its preference value.

IJCAI 2023: DL for Brain Encoding and Decoding

29

30 of 210

Algonauts 2021

  • fMRI from 10 human subjects that watched over 1,000 short (3 sec) video clips.

IJCAI 2023: DL for Brain Encoding and Decoding

30

31 of 210

Other Multimodal Stimulus Datasets

IJCAI 2023: DL for Brain Encoding and Decoding

31

Dataset

Type

Language

Stimulus

#Subjects

Paradigm

Size

Task

Mitchell et al., 2008

fMRI

English

60 different word-picture pairs from 12 categories.

9

Viewing word-picture pairs

60 different word-picture pairs presented six times each

Passive viewing

Sudre et al., 2012

MEG

English

60 concrete nouns along with line drawings

9

Reading

60 stimuli × 20 questions = 1200 examples

Question answering

Zinszer et al., 2017

fNIRS

English

8 concrete nouns (audiovisual word and picture stimuli): bunny, bear, kitty, dog, mouth, foot, hand, and nose

24

Viewing and listening

12 blocks with the 8 stimuli per subject.

Passive viewing and listening

Pereira et al., 2018

fMRI

English

180 Words with Picture, Sentences, word clouds; 96 text passages; 72 passages

16

Viewing WP, sentences or word clouds

180 WP, S and WC per subject; 96+72 passages shown 3 times

Passive viewing

Cao et al., 2021

fNIRS

Chinese

50 concrete nouns from 10 semantic categories

7

Viewing and listening

Each stimulus is presented 7 times.

Passive viewing and listening

Courtois Neuromod

fMRI

full-length movies and TV show

6

Viewing and Listening

~100 hours of data per participant

Passive viewing

32 of 210

Concrete nouns with line drawings

  • Subjects were asked to perform a QA task, while their brain activity was recorded using MEG.
  • Subjects were first presented with a question (e.g., “Is it manmade?”), followed by 60 concrete nouns, along with their line drawings, in a random order.
  • Each stimulus was presented until the subject pressed a button to respond “yes” or “no” to the initial question.
  • Once all 60 stimuli are presented, a new question is shown for a total of 20 questions.

IJCAI 2023: DL for Brain Encoding and Decoding

32

33 of 210

Word+Picture, Sentences, Word Clouds, Passages

  • Experiment 1: 180 words (128 nouns, 22 verbs, 29 adjectives and adverbs, and 1 function word). 3 paradigms.
  • Experiment 2: 96 text passages, each with 4 sentences from 24 broad topics (e.g., professions, clothing, birds, musical instruments, natural disasters, crimes, etc.)
  • Experiment 3: 72 passages, each with 3-4 sentences from another 24 topics.

IJCAI 2023: DL for Brain Encoding and Decoding

33

34 of 210

fNIRS with audio-visual stimuli

  • Stimuli are pictures and audios of 50 objects from 10 categories.
  • Visual presentation lasts for 3s, with audio presented immediately at the onset, followed by a 10s rest period.
  • During rest period, participants are instructed to fixate on an X displayed in the center of the screen.

IJCAI 2023: DL for Brain Encoding and Decoding

34

35 of 210

Text Stimulus Datasets References

  • Wehbe, Leila, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, and Tom Mitchell. "Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses." PloS one 9, no. 11 (2014): e112575.
  • Hollenstein, Nora, Jonathan Rotsztejn, Marius Troendle, Andreas Pedroni, Ce Zhang, and Nicolas Langer. "ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading." Scientific data 5, no. 1 (2018): 1-13.
  • Handjaras, Giacomo, Emiliano Ricciardi, Andrea Leo, Alessandro Lenci, Luca Cecchetti, Mirco Cosottini, Giovanna Marotta, and Pietro Pietrini. "How concepts are encoded in the human brain: a modality independent, category-based cortical organization of semantic knowledge." Neuroimage 135 (2016): 232-242.
  • Anderson, Andrew J., Douwe Kiela, Stephen Clark, and Massimo Poesio. "Visually grounded and textual semantic models differentially decode brain activity associated with concrete and abstract nouns." Transactions of the Association for Computational Linguistics 5 (2017): 17-30.
  • Anderson, Andrew James, Jeffrey R. Binder, Leonardo Fernandino, Colin J. Humphries, Lisa L. Conant, Rajeev DS Raizada, Feng Lin, and Edmund C. Lalor. "An integrated neural decoder of linguistic and experiential meaning." Journal of Neuroscience 39, no. 45 (2019): 8969-8987.
  • Oseki, Yohei, and Masayuki Asahara. "Design of BCCWJ-EEG: Balanced corpus with human electroencephalography." In Proceedings of the 12th Language Resources and Evaluation Conference, pp. 189-194. 2020.
  • Deniz, Fatma, Anwar O. Nunez-Elizalde, Alexander G. Huth, and Jack L. Gallant. "The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality." Journal of Neuroscience 39, no. 39 (2019): 7722-7736.

IJCAI 2023: DL for Brain Encoding and Decoding

35

36 of 210

Visual Stimulus Datasets References

  • Thirion, Bertrand, Edouard Duchesnay, Edward Hubbard, Jessica Dubois, Jean-Baptiste Poline, Denis Lebihan, and Stanislas Dehaene. "Inverse retinotopy: inferring the visual content of images from brain activation patterns." Neuroimage 33, no. 4 (2006): 1104-1116.
  • Kay, Kendrick N., Thomas Naselaris, Ryan J. Prenger, and Jack L. Gallant. "Identifying natural images from human brain activity." Nature 452, no. 7185 (2008): 352-355.
  • Horikawa, Tomoyasu, and Yukiyasu Kamitani. "Generic decoding of seen and imagined objects using hierarchical visual features." Nature communications 8, no. 1 (2017): 1-15.
  • Chang, Nadine, John A. Pyles, Austin Marcus, Abhinav Gupta, Michael J. Tarr, and Elissa M. Aminoff. "BOLD5000, a public fMRI dataset while viewing 5000 visual images." Scientific data 6, no. 1 (2019): 1-18.
  • Cichy, Radoslaw Martin, Gemma Roig, Alex Andonian, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Yalda Mohsenzadeh, Kandan Ramakrishnan, and Aude Oliva. "The algonauts project: A platform for communication between the sciences of biological and artificial intelligence." arXiv preprint arXiv:1905.05675 (2019).
  • Allen, Emily J., Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau et al. "A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence." Nature neuroscience 25, no. 1 (2022): 116-126.
  • Hebart, Martin N., Oliver Contier, Lina Teichmann, Adam H. Rockter, Charles Y. Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I. Baker. "THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior." Elife 12 (2023): e82580.

IJCAI 2023: DL for Brain Encoding and Decoding

36

37 of 210

Audio Stimulus Datasets References

  • Handjaras, Giacomo, Emiliano Ricciardi, Andrea Leo, Alessandro Lenci, Luca Cecchetti, Mirco Cosottini, Giovanna Marotta, and Pietro Pietrini. "How concepts are encoded in the human brain: a modality independent, category-based cortical organization of semantic knowledge." Neuroimage 135 (2016): 232-242.
  • Huth, Alexander G., Wendy A. De Heer, Thomas L. Griffiths, Frédéric E. Theunissen, and Jack L. Gallant. "Natural speech reveals the semantic maps that tile human cerebral cortex." Nature 532, no. 7600 (2016): 453-458.
  • Jain, Shailee, and Alexander Huth. "Incorporating context into language encoding models for fMRI." Advances in neural information processing systems 31 (2018).
  • Brennan, Jonathan R., and John T. Hale. "Hierarchical structure guides rapid linguistic predictions during naturalistic listening." PloS one 14, no. 1 (2019): e0207741.
  • Anderson, Andrew James, Kelsey McDermott, Brian Rooks, Kathi L. Heffner, David Dodell-Feder, and Feng V. Lin. "Decoding individual identity from brain activity elicited in imagining common experiences." Nature communications 11, no. 1 (2020): 1-14.
  • Nastase, Samuel A., Yun-Fei Liu, Hanna Hillman, Asieh Zadbood, Liat Hasenfratz, Neggin Keshavarzian, Janice Chen et al. "The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension." Scientific data 8, no. 1 (2021): 1-22.
  • Zhang, Yizhen, Kuan Han, Robert Worth, and Zhongming Liu. "Connecting concepts in the brain by mapping cortical representations of semantic relations." Nature communications 11, no. 1 (2020): 1877.
  • Li, Jixing, Shohini Bhattasali, Shulin Zhang, Berta Franzluebbers, Wen-Ming Luh, R. Nathan Spreng, Jonathan R. Brennan, Yiming Yang, Christophe Pallier, and John Hale. "Le Petit Prince: A multilingual fMRI corpus using ecological stimuli." Biorxiv (2021): 2021-10.
  • Gwilliams, Laura, Graham Flick, Alec Marantz, Liina Pylkkanen, David Poeppel, and Jean-Remi King. "MEG-MASC: a high-quality magneto-encephalography dataset for evaluating natural speech processing." arXiv preprint arXiv:2208.11488 (2022).

IJCAI 2023: DL for Brain Encoding and Decoding

37

38 of 210

Video Stimulus Datasets References

  • Seeliger, K., R. P. Sommers, Umut Güçlü, Sander E. Bosch, and M. A. J. Van Gerven. "A large single-participant fMRI dataset for probing brain responses to naturalistic stimuli in space and time." bioRxiv (2019): 687681.
  • Nishida, Satoshi, Yusuke Nakano, Antoine Blanc, Naoya Maeda, Masataka Kado, and Shinji Nishimoto. "Brain-mediated transfer learning of convolutional neural networks." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 5281-5288. 2020.
  • Cichy, Radoslaw Martin, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Polina Iamshchinina, M. Graumann, A. Andonian et al. "The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion." arXiv preprint arXiv:2104.13714 (2021).
  • Berezutskaya, Julia, Zachary V. Freudenburg, Luca Ambrogioni, Umut Güçlü, Marcel AJ van Gerven, and Nick F. Ramsey. "Cortical network responses map onto data-driven features that capture visual semantics of movie fragments." Scientific reports 10, no. 1 (2020): 1-21.
  • Huth, Alexander G., Shinji Nishimoto, An T. Vu, and T. Dupre La Tour. "Gallant lab natural short clips 3T fmri data." 50 GiB (2022).

IJCAI 2023: DL for Brain Encoding and Decoding

38

39 of 210

Multimodal Stimulus Datasets References

  • Mitchell, Tom M., Svetlana V. Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L. Malave, Robert A. Mason, and Marcel Adam Just. "Predicting human brain activity associated with the meanings of nouns." science 320, no. 5880 (2008): 1191-1195.
  • Sudre, Gustavo, Dean Pomerleau, Mark Palatucci, Leila Wehbe, Alona Fyshe, Riitta Salmelin, and Tom Mitchell. "Tracking neural coding of perceptual and semantic features of concrete nouns." NeuroImage 62, no. 1 (2012): 451-463.
  • Zinszer, Benjamin D., Laurie Bayet, Lauren L. Emberson, Rajeev DS Raizada, and Richard N. Aslin. "Decoding semantic representations from functional near-infrared spectroscopy signals." Neurophotonics 5, no. 1 (2017): 011003.
  • Pereira, Francisco, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J. Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. "Toward a universal decoder of linguistic meaning from brain activation." Nature communications 9, no. 1 (2018): 1-13.
  • Cao, Lu, Dandan Huang, Yue Zhang, Xiaowei Jiang, and Yanan Chen. "Brain decoding using fnirs." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 14, pp. 12602-12611. 2021.
  • Boyle, Julie A., Basile Pinsard, A. Boukhdhir, S. Belleville, S. Bram-batti, J. Chen, J. Cohen-Adad et al. "The Courtois project on neuronal modelling: 2020 data release." In Presented at the 26th annual meeting of the Organization for Human Brain Mapping. 2020.

IJCAI 2023: DL for Brain Encoding and Decoding

39

40 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

40

41 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
    • Text Stimulus Representations
    • Visual Stimulus Representations
    • Audio Stimulus Representations
    • Multimodal Stimulus Representations
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

41

42 of 210

Stimulus Representations

  • Text Stimuli
    • Basic NLP Representations: Corpus co-occurrence counts, topic models, Linguistic (POS, dependencies, roles)
    • Discourse features.
    • Semantic: word embedding methods, sentence representation models, recurrent neural networks and Transformer methods.
    • Experiential attributes: Rated on 0-6 scale or binary.
  • Visual Stimuli
    • Visual field filter banks
    • Gabor wavelet pyramid
    • HMAX model
    • Convolutional neural networks
  • Audio Stimuli
    • Phoneme rate and presence of phonemes.
  • Multimodal Stimuli

IJCAI 2023: DL for Brain Encoding and Decoding

42

43 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
    • Text Stimulus Representations
    • Visual Stimulus Representations
    • Audio Stimulus Representations
    • Multimodal Stimulus Representations
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

43

44 of 210

Text Stimulus Representations

  • Basic NLP Representations
    • Corpus co-occurrence counts
    • Topic models
    • Linguistic: POS, dependencies, roles.
  • Discourse
    • Characters, motion, speech, emotions, non-motion verbs
  • Deep Learning based Representations
    • Embeddings
    • Longer context using LSTMs
    • Transformers
  • Experiential attributes
    • Rated on 0-6 scale
    • Binary

IJCAI 2023: DL for Brain Encoding and Decoding

44

45 of 210

Basic NLP Representations for Word Stimuli

  • Corpus co-occurrence counts
    • 25 verbs (Mitchell et al., 2008; Pereira et al., 2013)
      • Verbs: see, hear, listen, taste, smell, eat, touch, nib, lift, manipulate, run, push, fill, move, ride, say, fear, open, approach, near, enter, drive, wear, break, and clean.
      • These verbs generally correspond to basic sensory and motor activities, actions per formed on objects, and actions involving changes to spatial relationships.
      • For each (verb, stimulus word w), feature value = normalized co-occurrence count of w with any of three forms of the verb (e.g., taste, tastes, or tasted) over the text corpus.
    • 985 common English words (such as above, worry, and mother) in (Huth et al., 2016).
  • Topic models (Pereira et al., 2013)
    • Get relevant Wiki pages (e.g., “airplane” is “Fixed-Wing Aircraft”) and other linked pages (e.g. “Aircraft cabin”)
    • LDA topic modelling on 3500 pages with #topics from 10 to 100, in increments of 5, setting the α parameter to 25/#topics.
    • LSA topic modelling (Wang et al., 2017)

IJCAI 2023: DL for Brain Encoding and Decoding

45

46 of 210

Basic NLP Representations for Word Stimuli

  • Word length
  • Is the word related to one of the 28 unique parts of speech and 17 unique dependency relationships?
  • Position of word in the sentence
  • Roles
    • Main verb
    • Agent or experiencer
    • Patient or recipient
    • Predicate of a sentence (The window was dusty)
    • Modifier (The angry activist broke the chair)
    • Complement in adjunct and propositional phrase, including direction, location, and time (The restaurant was loud at night).

IJCAI 2023: DL for Brain Encoding and Decoding

46

47 of 210

Discourse features (for Harry Potter dataset)

  • Characters: Resolve all pronouns to the character to whom they refer, and make binary features to signal which of the 10 characters are mentioned.
  • Motions: Identify a set of motions that occurred frequently in the chapter (e.g. fly, manipulate, collide physically, etc.).
  • Speech: Indicate the parts of the story that correspond to direct speech between the characters. Used the presence of dialog as a feature. 
  • Emotions: Identified a set of emotions that were felt by the characters in the chapter (e.g. annoyance, nervousness, pride, etc.).
  • Verbs: Identified a set of actions that occurred frequently in the chapter that were distinct from motion (e.g. hear, know, see, etc.).

IJCAI 2023: DL for Brain Encoding and Decoding

47

48 of 210

DL Representations: Using embeddings for word stimuli

  • GloVe 300D vectors (Pereira et al., 2016; Wang et al., 2017; Pereira et al., 2018; Anderson et al., 2019)
  • 1000D Non-negative sparse embeddings (Wehbe et al., 2014).
  • 300D embeddings by training a skip-gram model using negative sampling (SGNS) on Italian and English Wikipedia dumps using Gensim. (Anderson et al., 2017a)
  • FastText (Berezutskaya et al., 2020)
  • Comparison across multiple embedding methods
    • GloVe, word2vec, WordNet2Vec, FastText, ELMo (Hollenstein et al., 2019)
    • word2Vec, fastText, GloVe, Dependency-based word2vec, RWSGwn, ConceptNet, ELMo, averaged and concatenated combinations (Wang et al., 2020)

IJCAI 2023: DL for Brain Encoding and Decoding

48

49 of 210

DL Representations: Using longer context for word stimuli

  • Multi-task LSTMs
    • Predict next word and POS of next word.
  • ELMo embeddings: LSTM based pretrained language model

IJCAI 2023: DL for Brain Encoding and Decoding

49

50 of 210

DL Representations: Using sentence embeddings

  • Unstructured Models: Ignore sentence structure
    • Simple Pooling Methods
      • Average/max/concat(max, avg) pooling over word embeddings.
    • Advanced Pooling Methods
      • FastSent (Hill, Cho, and Korhonen 2016) sums word embeddings in a sentence as its representation to predict the surrounding sentences.
      • SIF (Arora, Liang, and Ma 2016) adapts the naïve averaging of word embeddings to weighted averaging.
  • Structured Models
    • Unsupervised Methods: Skip-thought, QuickThought.
    • Supervised Methods: InferSent, GenSen (Subramanian et al. 2018), Universal Sentence Encoder

IJCAI 2023: DL for Brain Encoding and Decoding

50

51 of 210

DL Representations: Transformer-based methods for text stimuli (Layer #, context length, architecture)

IJCAI 2023: DL for Brain Encoding and Decoding

51

Transformer-XL is the only model that continues to increase performance as the context length is increased. In all networks, the middle layers perform the best for contexts longer than 15 words. The deepest layers across all networks show a sharp increase in performance at short-range context (fewer than 10 words), followed by a decrease in performance. [Toneva and Wehbe, 2019]

52 of 210

DL Representations: Transformer-based methods for text stimuli (NLP task finetuning and scrambled LM)

  • Scrambled LM
    • Randomly shuffle words from the corpus samples, to remove all first order cues to syntactic structure.
    • LM-scrambled: words are shuffled within sentences
    • LM-scrambled-para: words are shuffled within their containing paragraphs in the corpus.
  • LM_pos: predict only the part of speech of a masked word, rather than the word itself.
  • Scrambled LMs work best!

IJCAI 2023: DL for Brain Encoding and Decoding

52

53 of 210

DL Representations: Transformer-based methods for text stimuli (NLP task finetuning)

IJCAI 2023: DL for Brain Encoding and Decoding

53

Tasks

Paraphrase, Summarization, Question Answering, Sentiment Analysis, NER, Word Sense Disambiguation, Natural Language Inference, Semantic Role Labeling, Coreference Resolution, Shallow Syntax Parsing

Pereira dataset: CR, NER, and SS perform the best.

Dendrogram constructed using similarity on representations from task-specific Transformer encoder models with stimuli from the dataset passed as input.

54 of 210

DL Representations: Transformer-based methods for text stimuli (Multi-task setup)

  • Settings
    • Finetune BERT vs not
    • Finetune BERT using one representative subject and train dense layer for each subject, vs finetune BERT for each subject.
    • Finetune BERT on MEG for all subjects, then finetune BERT on fMRI.
    • Multi-task finetune BERT for fMRI+MEG prediction task
  • Results
    • Fine-tuned models predict fMRI data better than vanilla BERT
    • Relationships between text and brain activity generalize across experiment participants.
    • Using MEG data can improve fMRI predictions.
    • A single model can be used to predict fMRI activity across multiple experiment participants.

IJCAI 2023: DL for Brain Encoding and Decoding

54

55 of 210

DL Representations: Comparing Transformers and extracting syntax vs semantics

  • Representations:
    • Lexical: representation that is context-invariant. E.g., word embeddings.
    • Compositional: “contextualized” representation generated by a system combining multiples words. E.g., parse trees
    • Syntax: representation associated with the structure of sentences independently of their meaning
    • Semantics: representation of a language system that are not syntactic.

  •  

IJCAI 2023: DL for Brain Encoding and Decoding

55

56 of 210

Experiential attributes model for text stimuli

  • Represents words in terms of human (Amazon Mechanical Turk) ratings of their degree of association with different attributes of experience
    • “On a scale of 0 to 6, to what degree do you think of a banana as having a characteristic or defining color?”
    • Anderson et al., 2019: 65 attributes spanning sensory, motor, affective, spatial, temporal, causal, social, and abstract cognitive experiences.
  • Value-add on top of text models: a lot of experiential information goes unstated in natural verbal communication.
    • E.g., it is rarely useful to communicate the color of bananas because it is obvious to all those with experience of bananas.
    • E.g., it would be unusual to specify that dropping things involves movement.
  • Nishida et al., 2020 use a subset of 20 attributes.

IJCAI 2023: DL for Brain Encoding and Decoding

56

57 of 210

Binary attribute representations

  • Each stimulus is represented using a binary vector capturing membership to one of the eight semantic categories.

  • 42 neurally plausible semantic features (NPSFs)
    • Perceptual and affective characteristics of an entity (10 NPSFs coded such features, such as man-made, size, color, temperature, positive affective valence, high affective arousal), animate beings (person, human-group, animal), and time and space properties (e.g. unenclosed setting, change of location)

IJCAI 2023: DL for Brain Encoding and Decoding

57

58 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
    • Text Stimulus Representations
    • Visual Stimulus Representations
    • Audio Stimulus Representations
    • Multimodal Stimulus Representations
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

58

59 of 210

Visual Stimuli

  • Visual field filter banks (Thirion et al., 2006; Nishimoto et al., 2011).
  • Gabor wavelet pyramid (Kay et al., 2008).
  • HMAX model (Horikawa et al., 2017).
  • Convolutional neural networks (Yamins et al., 2014; Anderson et al., 2017a; Beliy et al., 2019; Du et al., 2020; Nishida et al., 2020).

IJCAI 2023: DL for Brain Encoding and Decoding

59

60 of 210

Visual Stimuli: Gabor wavelet pyramid

IJCAI 2023: DL for Brain Encoding and Decoding

60

a, Spatial frequency and position. Wavelets occur at five spatial frequencies. This panel depicts one wavelet at each of the first five spatial frequencies. At each spatial frequency f cycles/field-of-view (FOV), wavelets are positioned on an f × f grid, as indicated by the translucent lines.

b, Orientation and phase. At each grid position, wavelets occur at eight orientations and two phases. This panel depicts a complete set of wavelets for a single grid position. Dashed lines indicate the bounds of the mask associated with each wavelet.

Gabor wavelet pyramid model. Each image is projected onto the individual Gabor wavelets comprising the Gabor wavelet pyramid. Gabor wavelets differ in size, position, orientation, spatial frequency, and phase. The projections for each quadrature pair of wavelets are squared, summed, and square-rooted, yielding a measure of contrast energy. The contrast energies for different quadrature wavelet pairs are weighted and then summed. Finally, a DC offset is added. The weights are determined by gradient descent with early stopping.

61 of 210

Visual Stimuli: HMAX model

  • Simple Cells S1
    • Input images are densely sampled by arrays of two-dimensional filters.
    • Output: -1 to 1
  • Complex Cells C1: max pooling
  • Simple Cells S2
    • Gaussian with mean 1 and standard deviation 1.
  • Complex Cells C2: max pooling
  • View Tuned Units (VTUs)
    • C2 units provide input to VTUs
    • C2 → VTU connections are the only stage of the HMAX model where learning occurs.

IJCAI 2023: DL for Brain Encoding and Decoding

61

62 of 210

Visual Stimuli: Convolutional Neural Networks (CNNs)

  • For word stimuli, gather 20 most relevant images using Google search, then get CNN representation (Anderson et al., 2017).
  • AlexNet, VGG-16 (Nishida et al., 2020; Berezutskaya et al., 2020), Inception, ResNet, DenseNet.

IJCAI 2023: DL for Brain Encoding and Decoding

62

63 of 210

Visual Stimuli: Object Recognition with Word embeddings

  • Step 1: Pass film frames through concept recognition module to get up to 20 concept labels per frame.
    • Used Clarifai.
  • Step 2: Get fastText embeddings for each concept label. Frame embedding is average of word embeddings.
  • Step 3: PCA for dimensionality reduction.

IJCAI 2023: DL for Brain Encoding and Decoding

63

64 of 210

Visual Stimuli: Semi-supervised CNNs

  • Problem: Scarce labeled data.

IJCAI 2023: DL for Brain Encoding and Decoding

64

Training phases & Architecture. (a) The first training phase: Supervised training of the Encoder with {Image, fMRI} pairs. (b) Second phase: Training the Decoder simultaneously with 3 types of data: {Image, fMRI} pairs (supervised examples), unlabeled natural images (self-supervision), and unlabeled test-fMRI (self-supervision). Note that the test-images are never used for training. The pretrained Encoder from the first training phase is kept fixed in the second phase. (c) Encoder and Decoder architectures. BN, US, and ReLU stand for batch normalization, up-sampling, and rectified linear unit, respectively.

65 of 210

Visual Stimuli: Convolutional LSTM Autoencoder

StepEncog, a convolutional LSTM autoencoder model trained on fMRI voxels.

IJCAI 2023: DL for Brain Encoding and Decoding

65

66 of 210

Latent Diffusion Models

IJCAI 2023: DL for Brain Encoding and Decoding

66

67 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
    • Text Stimulus Representations
    • Visual Stimulus Representations
    • Audio Stimulus Representations
    • Multimodal Stimulus Representations
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

67

68 of 210

Audio Stimuli

  • Word rate, Phoneme rate, Presence of phonemes (Huth et al., 2016).
  • SoundNet (Aytar, Vondrick, and Torralba 2016) features (Nishida et al., 2020)

IJCAI 2023: DL for Brain Encoding and Decoding

68

69 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
    • Text Stimulus Representations
    • Visual Stimulus Representations
    • Audio Stimulus Representations
    • Multimodal Stimulus Representations
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

69

70 of 210

Multimodal Stimulus Representations

  • Processing videos required audio+image representations
    • E.g., VGG+SoundNet (Nishida et al., 2020)
  • Image+text combination models (Wang et al., 2020)
    • GloVe+VGG, and ELMo+VGG
    • Averaging or concatenation

IJCAI 2023: DL for Brain Encoding and Decoding

70

71 of 210

Multimodal Stimuli: Visio-linguistic representations

  • Pretrained CNNs: VGGNet19, ResNet50, InceptionV2ResNet and EfficientNetB5
  • Pretrained text Transformers: RoBERTa
  • Image Transformers: Vision Transformer (ViT), Data Efficient Image Transformer (DEiT), and Bidirectional Encoder representation from Image Transformer (BEiT).
  • Late-fusion models: VGGNet19+RoBERTa, ResNet50+RoBERTa, InceptionV2ResNet+RoBERTa and EfficientNetB5+RoBERTa.
  • Multi-modal Transformers: Contrastive Language-Image Pre-training (CLIP), Learning Cross-Modality Encoder Representations from Transformers (LXMERT), and VisualBERT.
    • VisualBERT performs the best for brain encoding!

IJCAI 2023: DL for Brain Encoding and Decoding

71

72 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

72

73 of 210

References

[1] Nicolas Affolter, Beni Egressy, Damian Pascual, and Roger Wattenhofer. Brain2word: Decoding brain activity for language generation. arXiv preprint arXiv:2009.04765, 2020.

[2] Andrew J Anderson, Douwe Kiela, Stephen Clark, and Massimo Poesio. Visually grounded and textual semantic models differentially decode brain activity associated with concrete and abstract nouns. Transactions of the Association for Computational Linguistics, 5:17–30, 2017.

[3] Andrew James Anderson, Jeffrey R Binder, Leonardo Fernandino, Colin J Humphries, Lisa L Conant, Mario Aguilar, Xixi Wang, Donias Doko, and Rajeev DS Raizada. Predicting neural activity patterns associated with sentences using a neurobiologically motivated model of semantic representation. Cerebral Cortex, 27(9):4379–4395, 2017.

[4] Andrew James Anderson, Jeffrey R Binder, Leonardo Fernandino, Colin J Humphries, Lisa L Conant, Rajeev DS Raizada, Feng Lin, and Edmund C Lalor. An integrated neural decoder of linguistic and experiential meaning. Journal of Neuroscience, 39(45):8969–8987, 2019.

[5] Andrew James Anderson, Kelsey McDermott, Brian Rooks, Kathi L Heffner, David Dodell-Feder, and Feng V Lin. Decoding individual identity from brain activity elicited in imagining common experiences. Nature communications, 11(1):1–14, 2020.

[6] Richard Antonello, Javier Turek, Vy Vo, and Alexander Huth. Low-dimensional structure in the space of language representations is reflected in brain responses. arXiv preprint arXiv:2106.05426, 2021.

[7] Roman Beliy, Guy Gaziv, Assaf Hoogi, Francesca Strappini, Tal Golan, and Michal Irani. From voxels to pixels and back: Self-supervision in naturalimage reconstruction from fmri. arXiv preprint arXiv:1907.02431, 2019.

[8] Julia Berezutskaya, Zachary V Freudenburg, Luca Ambrogioni, Umut Güçlü, Marcel AJ van Gerven, and Nick F Ramsey. Cortical network responses map onto data-driven features that capture visual semantics of movie fragments. Scientific reports, 10(1):1–21, 2020.

[9] Charlotte Caucheteux, Alexandre Gramfort, and Jean-Remi King. Disentangling syntax and semantics in the brain with deep networks. In International Conference on Machine Learning, pages 1336–1348. PMLR, 2021.

[10] Charlotte Caucheteux and Jean-Rémi King. Language processing in brains and deep neural networks: computational convergence and its limits. BioRxiv, 2020.

[11] Joshua S Cetron, Andrew C Connolly, Solomon G Diamond, Vicki V May, James V Haxby, and David JM Kraemer. Decoding individual differences in stem learning from functional mri data. Nature communications, 10(1):1–10, 2019.

IJCAI 2023: DL for Brain Encoding and Decoding

73

74 of 210

References

[12] Nadine Chang, John A Pyles, Austin Marcus, Abhinav Gupta, Michael J Tarr, and Elissa M Aminoff. Bold5000, a public fmri dataset while viewing 5000 visual images. Scientific data, 6(1):1–18, 2019.

[13] Radoslaw Martin Cichy, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Polina Iamshchinina, M Graumann, A Andonian, NAR Murty, K Kay, Gemma Roig, et al. The algonauts project 2021 challenge: How the human brain makes sense of a world in motion. arXiv preprint arXiv:2104.13714, 2021.

[14] Radoslaw Martin Cichy, Gemma Roig, Alex Andonian, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Yalda Mohsenzadeh, Kandan Ramakrishnan, and Aude Oliva. The algonauts project: A platform for communication between the sciences of biological and artificial intelligence. arXiv e-prints, pages arXiv–1905, 2019.

[15] Changde Du, Changying Du, Lijie Huang, and Huiguang He. Conditional generative neural decoding with structured cnn feature prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 2629–2636, 2020.

[16] Michael Eickenberg, Alexandre Gramfort, Gaël Varoquaux, and Bertrand Thirion. Seeing it all: Convolutional network layers map the function of the human visual system. NeuroImage, 152:184–194, 2017.

[17] Jack Gallant. Human brain mapping and brain decoding, 2017.

[18] Jon Gauthier and Roger Levy. Linking artificial and human neural representations of language. arXiv preprint arXiv:1910.01244, 2019.

[19] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

[20] Giacomo Handjaras, Emiliano Ricciardi, Andrea Leo, Alessandro Lenci, Luca Cecchetti, Mirco Cosottini, Giovanna Marotta, and Pietro Pietrini. How concepts are encoded in the human brain: a modality independent, category-based cortical organization of semantic knowledge. Neuroimage, 135:232–242, 2016.

[21] Nora Hollenstein, Antonio de la Torre, Nicolas Langer, and Ce Zhang. Cognival: A framework for cognitive word embedding evaluation. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 538–549, 2019.

[22] Nora Hollenstein, Jonathan Rotsztejn, Marius Troendle, Andreas Pedroni, Ce Zhang, and Nicolas Langer. Zuco, a simultaneous eeg and eye-tracking resource for natural sentence reading. Scientific data, 5(1):1–13, 2018.

IJCAI 2023: DL for Brain Encoding and Decoding

74

75 of 210

References

[23] Alexander G Huth, Wendy A De Heer, Thomas L Griffiths, Frédéric E Theunissen, and Jack L Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016.

[24] Shailee Jain and Alexander G Huth. Incorporating context into language encoding models for fmri. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 6629–6638, 2018.

[25] S Jat, H Tang, P Talukdar, and T Mitchel. Relating simple sentence representations in deep neural networks and the brain. In ACL 2019-57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, pages 5137–5154. Association for Computational Linguistics (ACL), 2020.

[26] Marcel Adam Just, Vladimir L Cherkassky, Sandesh Aryal, and Tom M Mitchell. A neurosemantic theory of concrete noun representation based on the underlying brain codes. PloS one, 5(1):e8622, 2010.

[27] Kendrick N Kay, Thomas Naselaris, Ryan J Prenger, and Jack L Gallant. Identifying natural images from human brain activity. Nature, 452(7185):352–355, 2008.

[28] Jonas Kubilius, Martin Schrimpf, Kohitij Kar, Rishi Rajalingham, Ha Hong, Najib Majaj, Elias Issa, Pouya Bashivan, Jonathan Prescott-Roy, Kailyn Schmidt, et al. Brain-like object recognition with high-performing shallow recurrent anns. Advances in Neural Information Processing Systems, 32:12805–12816, 2019.

[29] Tom Mitchell. Neural representations of language meaning, 2014.

[30] Tom M Mitchell, Svetlana V Shinkareva, Andrew Carlson, Kai-Min Chang, Vicente L Malave, Robert A Mason, and Marcel Adam Just. Predicting human brain activity associated with the meanings of nouns. science, 320(5880):1191–1195, 2008.

[31] Thomas Naselaris, Ryan J Prenger, Kendrick N Kay, Michael Oliver, and Jack L Gallant. Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6):902–915, 2009.

[32] Samuel A Nastase, Yun-Fei Liu, Hanna Hillman, Asieh Zadbood, Liat Hasenfratz, Neggin Keshavarzian, Janice Chen, Christopher J Honey, Yaara Yeshurun, Mor Regev, et al. Narratives: fmri data for evaluating models of naturalistic language comprehension. bioRxiv, pages 2020–12, 2021.

[33] Satoshi Nishida, Yusuke Nakano, Antoine Blanc, Naoya Maeda, Masataka Kado, and Shinji Nishimoto. Brain-mediated transfer learning of convolutional neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5281–5288, 2020.

IJCAI 2023: DL for Brain Encoding and Decoding

75

76 of 210

References

[34] Shinji Nishimoto, An T Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L Gallant. Reconstructing visual experiences from brain activity evoked by natural movies. Current biology, 21(19):1641–1646, 2011.

[35] Subba Reddy Oota, Vijay Rowtula, Manish Gupta, and Raju S Bapi. Stepencog: A convolutional lstm autoencoder for near-perfect fmri encoding. In 2019 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2019.

[36] Francisco Pereira, Matthew Botvinick, and Greg Detre. Using wikipedia to learn semantic feature representations of concrete concepts in neuroimaging experiments. Artificial intelligence, 194:240– 252, 2013.

[37] Francisco Pereira, Bin Lou, Brianna Pritchett, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. Decoding of generic mental representations from functional mri data using word embeddings. bioRxiv, page 057216, 2016.

[38] Francisco Pereira, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. Toward a universal decoder of linguistic meaning from brain activation. Nature communications, 9(1):1–13, 2018.

[39] Martin Schrimpf, Idan Blank, Greta Tuckute, Carina Kauf, Eghbal A Hosseini, Nancy Kanwisher, Joshua Tenenbaum, and Evelina Fedorenko. The neural architecture of language: Integrative reverseengineering converges on a model for predictive processing. PNAS, Vol:To appear, 2021.

[40] Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J Majaj, Rishi Rajalingham, Elias B Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Franziska Geiger, et al. Brain-score: Which artificial neural network for object recognition is most brain-like? BioRxiv, page 407007, 2020.

[41] Dan Schwartz, Mariya Toneva, and Leila Wehbe. Inducing brain-relevant bias in natural language processing models. Advances in Neural Information Processing Systems, 32:14123–14133, 2019.

[42] K Seeliger, RP Sommers, Umut Güçlü, Sander E Bosch, and MAJ Van Gerven. A large singleparticipant fmri dataset for probing brain responses to naturalistic stimuli in space and time. bioRxiv, page 687681, 2019.

[43] Vishwajeet Singh, Krishna P. Miyapuram, and Raju S. Bapi. Detection of cognitive states from fmri data using machine learning techniques. In Manuela M. Veloso, editor, IJCAI, pages 587–592, 2007.

[44] Jonathan Smallwood and Jonathan W Schooler. The science of mind wandering: empirically navigating the stream of consciousness. Annual review of psychology, 66:487–518, 2015.

IJCAI 2023: DL for Brain Encoding and Decoding

76

77 of 210

References

[45] Jingyuan Sun, Shaonan Wang, Jiajun Zhang, and Chengqing Zong. Towards sentence-level brain decoding with distributed representations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 7047–7054, 2019.

[46] Jingyuan Sun, Shaonan Wang, Jiajun Zhang, and Chengqing Zong. Neural encoding and decoding with distributed sentence representations. IEEE Transactions on Neural Networks and Learning Systems, 32(2):589–603, 2020.

[47] Bertrand Thirion. Statistical inference in highdimension and application to brain imaging, 2019.

[48] Bertrand Thirion, Edouard Duchesnay, Edward Hubbard, Jessica Dubois, Jean-Baptiste Poline, Denis Lebihan, and Stanislas Dehaene. Inverse retinotopy: inferring the visual content of images from brain activation patterns. Neuroimage, 33(4):1104– 1116, 2006.

[49] Mariya Toneva, Otilia Stretcu, Barnabás Póczos, Leila Wehbe, and Tom M Mitchell. Modeling task effects on meaning representation in the brain via zero-shot meg prediction. Advances in Neural Information Processing Systems, 33, 2020.

[50] Mariya Toneva and Leila Wehbe. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). arXiv preprint arXiv:1905.11833, 2019.

[51] Aria Wang, Michael Tarr, and Leila Wehbe. Neural taskonomy: Inferring the similarity of task-derived representations from brain activity. Advances in Neural Information Processing Systems, 32:15501– 15511, 2019.

[52] Jing Wang, Vladimir L Cherkassky, and Marcel Adam Just. Predicting the brain activation pattern associated with the propositional content of a sentence: Modeling neural representations of events and states. Human brain mapping, 38(10):4865– 4881, 2017.

[53] Shaonan Wang, Jiajun Zhang, Haiyan Wang, Nan Lin, and Chengqing Zong. Fine-grained neural decoding with distributed word representations. Information Sciences, 507:256–272, 2020.

[54] Leila Wehbe, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, and Tom Mitchell. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses. in press, 2014.

[55] Daniel LK Yamins, Ha Hong, Charles F Cadieu, Ethan A Solomon, Darren Seibert, and James J DiCarlo. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences, 111(23):8619–8624, 2014.

[56] Boyle, Julie A., Basile Pinsard, A. Boukhdhir, S. Belleville, S. Bram-batti, J. Chen, J. Cohen-Adad et al. "The Courtois project on neuronal modelling: 2020 data release." In Presented at the 26th annual meeting of the Organization for Human Brain Mapping. 2020.

IJCAI 2023: DL for Brain Encoding and Decoding

77

78 of 210

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding

Subba Reddy Oota1, Manish Gupta2,3, Raju S. Bapi2, Mariya Toneva4

1Inria Bordeaux, France; 2IIIT Hyderabad, India; 3Microsoft, India; 4MPI for Software Systems, Germany

subba-reddy.oota@inria.fr, gmanish@microsoft.com, raju.bapi@iiit.ac.in, mtoneva@mpi-sws.org

79 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

79

80 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

80

81 of 210

Outline

  • Introduction to Brain Decoding
  • Decoding models
    • Linear Models
    • Non-Linear Models (including DNNs)
  • Language
    • Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

IJCAI 2023: DL for Brain Encoding and Decoding

81

82 of 210

Encoding vs. Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

82

Haiguang Wen et al, 2017

Encoding

Decoding

Stimulus

Representation

Stimulus

Representation

fMRI

fMRI

83 of 210

What is Brain Decoding?

  • Can we reconstruct the stimulus, given the brain response?
  • Can you read the mind with fMRI?
  • Or at least tell what the person saw?

IJCAI 2023: DL for Brain Encoding and Decoding

83

Visual Task

Language Task

Smith et al., 2011, Wang et al. 2019

84 of 210

Linguistic Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

84

input

output

Zou et al., 2022

85 of 210

Outline

  • Introduction to Brain Decoding
  • Decoding models
    • Linear Models
    • Non-Linear Models (including DNNs)
    • Evaluation Metrics
  • Language
    • Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

IJCAI 2023: DL for Brain Encoding and Decoding

85

86 of 210

Linear Decoder Models

IJCAI 2023: DL for Brain Encoding and Decoding

86

Ridge / Logistic Regression

Stimulus Representation

Stimulus Classification

Horikawa et al. 2018

87 of 210

Non-Linear Decoder

IJCAI 2023: DL for Brain Encoding and Decoding

87

Vu et al. 2018

Deep CNNs

88 of 210

Evaluating Decoding Models: Pairwise Accuracy

IJCAIi-2023: DL for Brain Encoding and Decoding

88

ith Concept Word

jth Concept Word

 

Periera et al. 2018

89 of 210

Evaluating Decoding Models: Rank Accuracy

IJCAI-2023: DL for Brain Encoding and Decoding

89

Y1

 

Y2

Yn

Periera et al. 2018

ith Concept Word

Correaltion

 

rank = rsort(corr_scores).index(correlation)

All the correlation scores in descending order

90 of 210

Representational Similarity Matrix (RSM)

IJCAI-2023: DL for Brain Encoding and Decoding

90

corr(Scene1, Scen2)

Moussa et al. 2012

91 of 210

Representational Dissimilarity Matrix (RDM)

IJCAI-2023: DL for Brain Encoding and Decoding

91

Hamed et al. 2014

92 of 210

Representation Similarity Analysis

IJCAI-2023: DL for Brain Encoding and Decoding

92

Kriegeskorte et al. 2018

DSM = RDM

93 of 210

Outline

  • Introduction to Brain Decoding
  • Decoding models
    • Linear Models
    • Non-Linear Models (including DNNs)
  • Language
    • Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

IJCAI 2023: DL for Brain Encoding and Decoding

93

94 of 210

Linguistic Brain Decoding

  • Toward Word-level Universal Brain Decoder
  • Does injecting linguistic structure into language models lead to better alignment with brain recordings?
  • Multi-view and Cross-view Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

94

Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

95 of 210

Classical Decoders

  • Classical decoding solutions extracting linguistic meaning from imaging data have been largely limited to
    • concrete nouns,
    • using similar stimuli for training and testing,
    • small number of semantic categories.

IJCAI 2023: DL for Brain Encoding and Decoding

95

Mitchell et al. 2008

96 of 210

Toward a universal decoder

  • Presented a new approach for building a brain decoding system:
    • words and sentences are represented as vectors in a semantic space constructed from massive text corpora.
    • wide variety of both concrete and abstract topics from two separate datasets.
    • subject reads naturalistic linguistic stimuli on potentially any topic, including abstract ideas (ex., pleasure, justice, love, etc).

IJCAI 2023: DL for Brain Encoding and Decoding

96

Pereira et al. 2018

GloVE

Pennington et al. 2014

97 of 210

Dataset Details (Experiment-1)

IJCAI 2023: DL for Brain Encoding and Decoding

97

Concept + Sentence View

Concept Word

Concept + Picture View

Concept + Wordcloud View

Periera et al. 2018

98 of 210

Dataset Details (Experiment-1)

  • 180 Concepts
    • 128 nouns
    • 22 verbs
    • 29 adjectives
    • 1 function word
  • 16 subjects
  • AAL atlas (180 regions)
  • Gordon atlas (333 regions)

IJCAI 2023: DL for Brain Encoding and Decoding

98

Periera et al. 2018

99 of 210

Dataset Details (Experiments 2 and 3)

IJCAI 2023: DL for Brain Encoding and Decoding

99

Topic

Concept

Topic

Periera et al. 2018

100 of 210

Informative Voxel Selection

Cogsci-2022: DL for Brain Encoding and Decoding

100

Voxel + 26 neighbors in 3D

Input

Ridge Regression

Output

Stimulus:

Apartment

Present

GloVE

Present

Stimulus:

Apartment

Pearson Correlation (R) = Corr(Y, W(X))

Correlation across feature dimensions

V1 – R1

V2 – R2

….

Vn – R3

Select 5000 voxels based on top-5000 correlation scores

3D Image

X

Y

W

101 of 210

Pairwise and Rankwise Results

IJCAI 2023: DL for Brain Encoding and Decoding

101

Periera et al. 2018

Decoder built from Expt 1 could distinguish sentences at all levels of granularity

Universal Decoder!

102 of 210

Distribution of Informative Voxels

IJCAI 2023: DL for Brain Encoding and Decoding

102

Periera et al. 2018

Brain activation patterns consistent across 16 Ss

5000 informative voxels are roughly evenly distributed among the four networks

Overall, LN contains a relatively higher proportion of informative voxels, compared to its size!

103 of 210

Insights

  • Presented a viable approach for building a universal decoder, capable of extracting a representation of mental content from linguistic materials.
  • The semantic resolution of brain-based decoding of mental content will continue to improve rapidly
    • given the progress in the development of distributed semantic representations

IJCAI 2023: DL for Brain Encoding and Decoding

103

Periera et al. 2018

104 of 210

Linguistic Brain Decoding

  • Toward Word-level Universal Brain Decoder
  • Linking artificial and human neural representations of language
  • Multi-view and Cross-view Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

104

Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

105 of 210

Linking artificial and human neural representations of language

IJCAI 2023: DL for Brain Encoding and Decoding

105

Ridge Regression

Gauthier et al. 2019

  • Evaluate the link between human brain activity and neural network models as the models are optimized for different tasks.
  • To investigate why these mappings are successful?
  • Uncovering the parallel representational contents shared between human brains and neural networks

106 of 210

Cogsci-2022: DL for Brain Encoding and Decoding

106

Devlin et al. 2019

Pretrained vs. Task-specific language models

107 of 210

IJCAI 2023: DL for Brain Encoding and Decoding

107

Natural Language Understaning Tasks

  • Paraphrase
  • Question Answering
  • Sentiment Analysis
  • Natural Language Inference

Devlin et al. 2019, Bowon et al. 2020

Pretrained vs. Task-specific language models

Squad-2.0: Question Answering

108 of 210

Custom Tasks

  • Scrambled language modeling:
    • LM-scrambled: deals with sentence inputs where words are shuffled within sentences
    • LM-scrambled-para, uses inputs where words are shuffled within their containing paragraphs in the corpus.

IJCAI 2023: DL for Brain Encoding and Decoding

108

Fingers are used for grasping, writing, grooming and other activities.

grasping are used for Fingers, grooming, writing and other activities.

This is Los Angeles. And it's the height of summer. In a small bungalow off of La Cienega, Clara serves homemade chili and chips in red plastic bowls -- wine in blue plastic.

This is Los Angeles. And the height it's of summer. In a bungalow off small of La Cienega, Clara serves homemade chili and chips in red plastic bowls -- wine in blue plastic.

Gauthier et al. 2019

109 of 210

Brain decoding performance

IJCAI 2023: DL for Brain Encoding and Decoding

109

Scrambled language models have shown better performance!!

Gauthier et al. 2019

110 of 210

Brain decoding performance trajectories over fine-tuning time

IJCAI 2023: DL for Brain Encoding and Decoding

110

Gauthier et al. 2019

111 of 210

Summary

  • Set of scrambled language modeling tasks which best match the structure of brain activations among the models tested.
    • models optimized for LM- scrambled and LM-scrambled-para — the models which improve in brain decoding performance

IJCAI 2023: DL for Brain Encoding and Decoding

111

Gauthier et al. 2019

112 of 210

Linguistic Brain Decoding

  • Toward Word-level Universal Brain Decoder
  • Linking artificial and human neural representations of language (contd)
  • Multi-view and Cross-view Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

112

Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

113 of 210

IJCAI 2023: DL for Brain Encoding and Decoding

113

Continuous Language Decoder

Tang, LaBel, Jain & Huth (2023)

114 of 210

IJCAI 2023: DL for Brain Encoding and Decoding

114

Continuous Language Decoder

Tang, LaBel, Jain & Huth (2023)

115 of 210

IJCAI 2023: DL for Brain Encoding and Decoding

115

Continuous Language Decoder

Tang, LaBel, Jain & Huth (2023)

116 of 210

Summary

  • Continuous language representations of semantic meaning can be decoded (reconstructed) from non-invasive brain recordings (fMRI),
  • Given novel brain recordings, decoder generates intelligible word sequences that recover the meaning of perceived speech, imagined speech, and even silent videos, demonstrating that a single language decoder can be applied to a range of semantic tasks.
  • Exciting possibility enabling future multipurpose brain-computer interfaces!

IJCAI 2023: DL for Brain Encoding and Decoding

116

Tang, LaBel, Jain & Huth (2023)

117 of 210

Linguistic Brain Decoding

  • Toward Word-level Universal Brain Decoder
  • Linking artificial and human neural representations of language
  • Multi-view and Cross-view Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

117

Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

118 of 210

Multi-view and Cross-ViewBrain Decoding

  • Human brains have the unique capability of language acquisition:
    • the process of learning the language
    • understand the meaning of concepts from multiple modalities such as images, text, speech, and videos.
  • Prior works focus on single-view brain decoding using traditional feature engineering.
  • However, how the brain captures the meaning of linguistic stimuli across multiple views is still a critical open question in neuroscience.
  • Consider three different views of the concept bird:
    • (1) sentence using the target word,
    • (2) picture presented with the target word label, and
    • (3) word cloud containing the target word along with other semantically related words.
  • Earlier works have explored which of these three different views provides richer information to understand the concept.

IJCAI 2023: DL for Brain Encoding and Decoding

118

Oota et al. 2022

119 of 210

Multi-view decoding

IJCAI 2023: DL for Brain Encoding and Decoding

119

Wordcloud View

Train

Sentence View

Picture View

Wordcloud View

Oota et al. 2022

Picture View

Train

Sentence View

Train

120 of 210

Multi-view decoding results

IJCAI 2023: DL for Brain Encoding and Decoding

120

Picture View

Train

BERT Representaions

Shuffled the Target Concepts

Test

Sentence View

Train

WordCloud View

Train

Pictures Best Accuracy

Sentences Best Accuracy

Oota et al. 2022

121 of 210

Distribution of Informative Voxels

IJCAI 2023 : DL for Brain Encoding and Decoding

121

Oota et al. 2022

122 of 210

Cross-view Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

122

Picture View

Train

Caption

Test

Picture View

Train

Visual words

Test

Wordcloud View

Train

Sentence

Test

Sentence View

Train

Keywords

Test

Oota et al. 2022

123 of 210

Cross-view Decoding results

IJCAI 2023: DL for Brain Encoding and Decoding

123

BERT Representaions

Shuffled the Target Concepts

Oota et al. 2022

124 of 210

Summary

  • Cross-view and Multi-view decoding tasks establish that the information contained in the brain response is rich and capable of driving multiple downstream tasks.

IJCAI 2023: DL for Brain Encoding and Decoding

124

Oota et al. 2022

125 of 210

Linguistic Brain Decoding

  • Toward Word-level Universal Brain Decoder
  • Linking artificial and human neural representations of language
  • Multi-view and Cross-view Decoding

IJCAI 2023: DL for Brain Encoding and Decoding

125

Periera et al. 2018, Gauthier et al. 2019, Huth et al. 2023, Oota et al. 2022

126 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

126

127 of 210

References

IJCAI 2023: DL for Brain Encoding and Decoding

127

128 of 210

References

  • Nishimoto, Shinji, et al. "Reconstructing visual experiences from brain activity evoked by natural movies." Current biology 21.19 (2011): 1641-1646.
  • Anumanchipalli, Gopala K., Josh Chartier, and Edward F. Chang. "Speech synthesis from neural decoding of spoken sentences." Nature 568.7753 (2019): 493-498.
  • Schrimpf, Martin, et al. "The neural architecture of language: Integrative modeling converges on predictive processing." Proceedings of the National Academy of Sciences 118.45 (2021): e2105646118.
  • Wehbe, Leila, et al. "Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses." PloS one 9.11 (2014): e112575.

IJCAI 2023: DL for Brain Encoding and Decoding

128

129 of 210

Deep Learning for Brain Encoding and Decoding

Subba Reddy Oota1, Manish Gupta2,3, Raju S. Bapi2, Mariya Toneva4

1Inria Bordeaux, France; 2IIIT Hyderabad, India; 3Microsoft, India; 4MPI for Software Systems, Germany

subba-reddy.oota@inria.fr, gmanish@microsoft.com, raju.bapi@iiit.ac.in, mtoneva@mpi-sws.org

129

130 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour 30 min]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 15 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

130

131 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour 30 min]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 15 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
    • Classic findings & common approaches
    • More recent findings utilizing deep learning
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

131

132 of 210

Mechanistic understanding of information processing in the brain: 4 big questions

132

How

Where

When

What

IJCAI 2023: DL for Brain Encoding and Decoding

133 of 210

Encoding models have a causal interpretation

133

stimulus properties

corr( )

Ytest, Ytest

^

Evaluate:

“The problem is when the capsule moves from an elliptical orbit to a parabolic orbit.”

Reveal which brain areas are affected by stimulus properties [Weichwald et al. 2015]

ytrain

Train:

<0,1,...0>

latent brain-relevant

stimulus properties

= hypothesis for

stim. representation

stimulus representation

<0, 1, … 0>

Part of Speech: Noun

IJCAI 2023: DL for Brain Encoding and Decoding

134 of 210

Classic findings using encoding models

  • Using representations of stimuli not from deep learning
  • Language:
    • Mitchell et al. 2008, Science
  • Vision:
    • Kay et al. 2008, Nature
  • Audio:
    • Santoro et al. 2014, PLoS Comp Bio

134

IJCAI 2023: DL for Brain Encoding and Decoding

135 of 210

Classic encoding model finding: Language

  • Stimuli: concrete nouns + line drawings
  • Stimulus representation: corpus co-occurrence counts with 25 sensory-motor verbs (e.g. see, hear, taste, smell)

135

[Barsalou, 1999; Barsalou, 2008; Pecher et al., 2005]

figure from Kemmerer, 2014; adapted from Thompson-Schill et al. 2006

Empirical evidence for distributed organization for attributes related to:

  • audition [Kiefer et al., 2008]
  • color [Simmons et al., 2007]
  • shape [Chao et al., 1999]
  • motion [Damasio et al., 1996]
  • olfaction and taste [Goldberg, Perfetti, et al., 2006a; Goldberg, Perfetti, et al., 2006b]

bear

IJCAI 2023: DL for Brain Encoding and Decoding

136 of 210

Classic encoding model finding: Language

  • Stimuli: concrete nouns + line drawings
  • Stimulus representation: corpus co-occurrence counts with 25 sensory-motor verbs (e.g. see, hear, taste, smell)
  • Brain recording: fMRI

136

bear

Accurately predicts fMRI recordings for a novel word

Correspondences

between a semantic property (“push”) and the function of the cortical regions where the fMRI recordings are well predicted

IJCAI 2023: DL for Brain Encoding and Decoding

137 of 210

Classic encoding model finding: Vision

137

  • Stimuli: natural images
  • Stimulus representation: mixtures of Gabor wavelets
  • Brain recording & modality: fMRI, viewing

Encoding models estimated quantitative receptive fields for V1-V3 voxels

Identified which of a set of candidate natural image was viewed by a participant

IJCAI 2023: DL for Brain Encoding and Decoding

138 of 210

Classic encoding model finding: Audio

138

  • Stimuli: natural sounds (speech, music, nature, tools)
  • Stimulus representation: spectro-temporal filters that are selective for modulations along space and/or time
  • Brain recording & modality: fMRI, listening

spatial

temporal

posterior/dorsal auditory: coarse spectral info & high temporal precision

anterior/ventral auditory: fine-grained spectral & low temporal precision

IJCAI 2023: DL for Brain Encoding and Decoding

139 of 210

Deep learning models enable data-driven encoding models for naturalistic stimuli

139

more stimulus properties that affect brain activity

more naturalistic stimuli

<0,1,...0>

simple stim. representations explain less variance in brain activity

DeepMind’s New AI Taught Itself to Be the World’s Greatest Go Player

Singularity Hub

Meet GPT-3. It Has Learned to Code (and Blog and Argue)

The New York Times

IJCAI 2023: DL for Brain Encoding and Decoding

140 of 210

Data-driven encoding models evaluate the relationships between brains and deep learning models

140

fMRI

A priori locations in DL system and brain

Deep learning system

how are they related?

Multimodal naturalistic stimulus

Data-driven encoding model

IJCAI 2023: DL for Brain Encoding and Decoding

141 of 210

Encoding: training and evaluation

141

function often modeled as linear

[Mitchell et al. 2008, Nishimoto et al., 2011;

Sudre et al., 2012; Wehbe et al., 2014]

Considerations for

Linear vs non-linear

IJCAI 2023: DL for Brain Encoding and Decoding

142 of 210

Encoding: training and evaluation

142

function often modeled as linear

[Mitchell et al. 2008, Nishimoto et al., 2011;

Sudre et al., 2012; Wehbe et al., 2014]

Training: cross validation (CV), regularization parameter chosen via nested CV

Evaluation: 1) make predictions for heldout data

2) compare predictions with true brain data

3) stringent statistical testing

IJCAI 2023: DL for Brain Encoding and Decoding

143 of 210

Encoding: training setup

  • Method:
    • Split dataset into train, validation, and test
    • Employ cross-validation to select model parameters based on validation dataset
    • Reduce overfitting by using regularization
      • Ridge regularization

143

Test how well predicts unseen brain recordings

Learn function

  • Goal: find a mapping from stimulus representation to brain data that generalizes to new brain data

IJCAI 2023: DL for Brain Encoding and Decoding

144 of 210

Encoding: training independent models

  • Independent model per participant
  • Independent model per voxel / sensor-timepoint

144

P1

P2

PN

P1, v1

P1, v2

P1, vm

IJCAI 2023: DL for Brain Encoding and Decoding

145 of 210

Encoding: fMRI specifics

145

IJCAI 2023: DL for Brain Encoding and Decoding

146 of 210

Encoding: evaluation setup

  • Predict data heldout from training by applying learned function to corresponding stimulus representations

  • Compare predictions of brain data to true brain data:
    • Evaluation metrics

146

Test how well predicts unseen brain recordings

Learn function

IJCAI 2023: DL for Brain Encoding and Decoding

147 of 210

Encoding: evaluation metrics

147

Pearson correlation

2v2 accuracy

IJCAI 2023: DL for Brain Encoding and Decoding

148 of 210

Encoding: statistical significance

  • Goal: determine whether the estimated similarity between the DL representations and the brain recordings is significant
  • Simple method that makes no assumptions about underlying data:
  • Permutation test
    • Break input-to-output correspondence by permuting output labels
    • Estimate similarity
    • Repeat 1000s times to estimate null distribution
    • P-value = proportion of times the similarity metric from permuted labels >= sim. metric from original labels
  • Specifically for fMRI:
    • Permute labels in blocks to preserve the autoregressive structure
  • Correct for multiple comparisons
  • FDR, FWER, etc.

148

IJCAI 2023: DL for Brain Encoding and Decoding

149 of 210

Encoding: performance visualization

149

fMRI

MEG/EEG

IJCAI 2023: DL for Brain Encoding and Decoding

150 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour 30 min]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 15 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
    • Classic findings & common approaches
    • More recent findings utilizing deep learning
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

150

151 of 210

More recent work utilizing progress in DL for encoding

  • Using representations of stimuli from deep learning systems
  • Language:
    • Wehbe et al. 2014; Jain and Huth, 2018; Toneva and Wehbe, 2019; Caucheteux and King, 2020/2022; Schrimpf et al. 2020/2021; Goldstein et al. 2021/2022
  • Vision:
    • Yamins et al. 2014; Cichy et al. 2016; Konkle and Alvarez, 2020/2022; Zhuang et al. 2022
  • Audio:
    • Kell et al. 2018; Vaidya, Jain, and Huth 2022; Millet et al. 2022

151

IJCAI 2023: DL for Brain Encoding and Decoding

152 of 210

Language: work utilizing DL progress

152

  • Stimuli: one chapter of Harry Potter
  • Stimulus representation: derived from an NLP system (RNN) trained on Harry Potter fan fiction
  • Brain recording: MEG, reading

significant word-by-word alignment between MEG & representations of words and context from recurrent NLP systems

IJCAI 2023: DL for Brain Encoding and Decoding

153 of 210

Audio: work utilizing DL progress

153

  • Stimuli: Moth Radio Hour
  • Stimulus representation: derived from self-supervised text language model trained to predict upcoming word in other radio stories
  • Brain recording & modality: fMRI, listening

alignment between fMRI & recurrent NLP representations w/ varying context;

best alignment with middle layer

IJCAI 2023: DL for Brain Encoding and Decoding

154 of 210

Language: work utilizing DL progress

154

  • Stimuli: one chapter of Harry Potter
  • Stimulus representation: derived from pretrained NLP systems
  • Brain recording & modality: fMRI, reading

across several types of large NLP systems, best alignment with fMRI in middle layers

IJCAI 2023: DL for Brain Encoding and Decoding

155 of 210

Language: work utilizing DL progress

155

  • Stimuli: sentences
  • Stimulus representation: derived from pretrained NLP systems
  • Brain recording & modality: MEG & fMRI, reading

best alignment with fMRI & MEG in middle layers

better performance at predicting next word -> better prediction of fMRI & MEg

IJCAI 2023: DL for Brain Encoding and Decoding

156 of 210

Language: work utilizing DL progress

156

  • Stimuli: sentences, passages, short story
  • Stimulus representation: derived from pretrained NLP systems
  • Brain recording & modality: fMRI & ECoG, reading & listening

some NLP systems can predict fMRI and ECoG up to 100% of estimated noise ceiling

IJCAI 2023: DL for Brain Encoding and Decoding

157 of 210

Language: work utilizing DL progress

157

  • Stimuli: story
  • Stimulus representation: derived from pretrained NLP systems
  • Brain recording & modality: ECoG, listening

NLP word representations predict ECoG recordings for upcoming words

IJCAI 2023: DL for Brain Encoding and Decoding

158 of 210

Recent work utilizing progress in DL for encoding

  • Using representations of stimuli from deep learning systems
    • Data-driven
  • Language:
    • Wehbe et al. 2014; Jain and Huth, 2018; Toneva and Wehbe, 2019; Caucheteux and King, 2020/2022; Schrimpf et al. 2020/2021; Goldstein et al. 2021/2022
  • Vision:
    • Yamins et al. 2014; Cichy et al. 2016; Konkle and Alvarez, 2020/2022; Zhuang et al. 2022
  • Audio:
    • Kell et al. 2018; Vaidya, Jain, and Huth 2022; Millet et al. 2022

158

IJCAI 2023: DL for Brain Encoding and Decoding

159 of 210

Vision: work utilizing DL progress

159

  • Stimuli: images of natural objects
  • Stimulus representation: layers in pretrained CNNs
  • Brain recording & modality: multiarray recordings in rhesus macaques, vision

Highest layer in CNN model most predictive of IT; intermediate layers most predictive of V4

IJCAI 2023: DL for Brain Encoding and Decoding

160 of 210

Vision: work utilizing DL progress

160

  • Stimuli: images of natural objects
  • Stimulus representation: layers of CNN tuned for object classification
  • Brain recording: fMRI & MEG, vision

A CNN tuned for object classification captures stages of human visual processing in both space and time

IJCAI 2023: DL for Brain Encoding and Decoding

161 of 210

Vision: work utilizing DL progress

161

  • Stimuli: images of objects
  • Stimulus representation: layers in self-supervised deep model
  • Brain recording: fMRI, vision

Self-supervised deep models achieve parity with category-supervised models in predicting fMRI responses along visual hierarchy

IJCAI 2023: DL for Brain Encoding and Decoding

162 of 210

Vision: work utilizing DL progress

162

  • Stimuli: images of objects
  • Stimulus representation: layers in self-supervised deep model
  • Brain recording: multiarray recordings in rhesus macaques, vision

Self-supervised deep models produce brain-like representations even when trained solely with noisy data from child head-mounted cameras

IJCAI 2023: DL for Brain Encoding and Decoding

163 of 210

Recent work utilizing progress in DL for encoding

  • Using representations of stimuli from deep learning systems
    • Data-driven
  • Language:
    • Wehbe et al. 2014; Jain and Huth, 2018; Toneva and Wehbe, 2019; Caucheteux and King, 2020/2022; Schrimpf et al. 2020/2021; Goldstein et al. 2021/2022
  • Vision:
    • Yamins et al. 2014; Cichy et al. 2016; Konkle and Alvarez, 2020/2022; Zhuang et al. 2022
  • Audio:
    • Kell et al. 2018; Vaidya, Jain, and Huth 2022; Millet et al. 2022

163

IJCAI 2023: DL for Brain Encoding and Decoding

164 of 210

Audio: work utilizing DL progress

164

  • Stimuli: natural sounds
  • Stimulus representation: deep model optimized for speech and music recognition
  • Brain recording & modality: fMRI, listening

Primary auditory responses predicted best by intermediate layers of task-optimized model;

non-primary responses predicted best by late layers

IJCAI 2023: DL for Brain Encoding and Decoding

165 of 210

Audio: work utilizing DL progress

165

  • Stimuli: Moth Radio Hour
  • Stimulus representation: derived from pretrained self-supervised speech models
  • Brain recording & modality: fMRI, listening

Middle layers of self-supervised speech models predict auditory cortex the best

IJCAI 2023: DL for Brain Encoding and Decoding

166 of 210

Audio: work utilizing DL progress

166

  • Stimuli: audio books
  • Stimulus representation: derived from pretrained self-supervised speech model
  • Brain recording & modality: fMRI, listening in 3 languages (Eng, Fr, Mandarin)

Self-supervised speech models reveal specialization for native sounds in the STS and MTG;

IFG and AG show more general specialization for speech rather than native-language

IJCAI 2023: DL for Brain Encoding and Decoding

167 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour 30 min]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 15 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
    • Classic findings & common approaches
    • More recent findings utilizing deep learning
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

167

168 of 210

Deep Learning for Brain Encoding and Decoding

Subba Reddy Oota1, Manish Gupta2,3, Raju S. Bapi2, Mariya Toneva4

1Inria Bordeaux, France; 2IIIT Hyderabad, India; 3Microsoft, India; 4MPI for Software Systems, Germany

subba-reddy.oota@inria.fr, gmanish@microsoft.com, raju.bapi@iiit.ac.in, mtoneva@mpi-sws.org

168

169 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour 30 min]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 15 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

169

170 of 210

Challenges in using DL for cognitive modeling

  • Not designed to specifically model brain processing

170

NLP systems: Designed to predict upcoming words

Harry never thought ???

Harry never thought he ???

Harry never thought he would ???

...

IJCAI 2023: DL for Brain Encoding and Decoding

171 of 210

Challenges in using DL for cognitive modeling

  • Not designed to specifically model brain processing
    • Training DL models using brain recordings
    • Task-based modeling

171

IJCAI 2023: DL for Brain Encoding and Decoding

172 of 210

Challenges in using DL for cognitive science

  • Not designed to specifically model brain processing
    • Training DL models using brain recordings
    • Task-based modeling
  • Can be difficult to interpret due to multiple sources of information

172

part-of-speech

semantic role

dependence on other words

...

+

+

+

?

IJCAI 2023: DL for Brain Encoding and Decoding

173 of 210

Challenges in using DL for cognitive science

  • Not designed to specifically model brain processing
    • Training DL models using brain recordings
    • Task-based modeling
  • Can be difficult to interpret due to multiple sources of information
    • Disentangling contributions of different info sources to brain predictions

173

IJCAI 2023: DL for Brain Encoding and Decoding

174 of 210

Challenges in using DL for cognitive science

  • Not designed to specifically model brain processing
    • Training DL models using brain recordings
    • Task-based modeling
  • Can be difficult to interpret due to multiple sources of information
    • Disentangling contributions of different info sources to brain predictions

174

IJCAI 2023: DL for Brain Encoding and Decoding

175 of 210

Training DL models using brain recordings

175

Brain-optimized NLP model predicts unseen fMRI recordings better, especially in canonical language regions

A priori locations in NLP system and brain

NLP system

Chapter of a book

𝑥 alignment

error propagation

fMRI

  • Stimuli: one chapter of Harry Potter
  • Stimulus representation: brain-optimized NLP model
  • Brain recording & modality: fMRI & MEG, reading

IJCAI 2023: DL for Brain Encoding and Decoding

176 of 210

Training DL models using brain recordings

176

  • Stimuli: movie and TV show clips
  • Stimulus representation: brain-optimized CNN
  • Brain recording & modality: fMRI, vision

Brain-optimized vision model trained entirely on fMRI recordings ~= task-optimized networks for predicting brain recordings in early and high-level ROI

IJCAI 2023: DL for Brain Encoding and Decoding

177 of 210

Training DL models using brain recordings

177

  • Stimuli: images natural scenes
  • Stimulus representation: brain-optimized CNN
  • Brain recording & modality: fMRI, vision

Brain-optimized vision model can predict brain signals corresponding to a category of stimuli that it was never trained on

IJCAI 2023: DL for Brain Encoding and Decoding

178 of 210

Training DL models using brain recordings

178

  • Stimuli: images natural scenes
  • Stimulus representation: brain-optimized CNN
  • Brain recording & modality: fMRI, vision

Brain-optimized vision model can learn representations that do not follow a strict hierarchy

IJCAI 2023: DL for Brain Encoding and Decoding

179 of 210

Challenges in using DL for cognitive modeling

  • Not designed to specifically model brain processing
    • Training DL models using brain recordings
    • Task-based modeling
  • Can be difficult to interpret due to multiple sources of information
    • Disentangling contributions of different info sources to brain predictions

179

IJCAI 2023: DL for Brain Encoding and Decoding

180 of 210

Tasks affect processing

180

  • Stimuli: natural movies
  • Task: visual search for vehicles or humans
  • Stimulus representation: object and action labels from WordNet
  • Brain recording & modality: fMRI, vision

Category-based attention during natural vision alters representation of both attended and unattended categories

IJCAI 2023: DL for Brain Encoding and Decoding

181 of 210

Tasks affect processing

181

bear

X

veg?

bear

X

tool?

800ms

306 sensors

800ms

306 sensors

Systematic difference due to different question tasks

Attention emphasizes task-relevant information

Mechanism?

Can we model as a function of the task AND stimulus?

IJCAI 2023: DL for Brain Encoding and Decoding

182 of 210

Tasks affect processing

182

question task effect word effect

significant prediction performance

The end of semantic processing of a word is task-dependent

  • Stimuli: concrete nouns + line drawings
  • Task: answer Yes/No questions about noun
  • Stimulus representation: human judgments
  • Brain recording & modality: MEG, reading

IJCAI 2023: DL for Brain Encoding and Decoding

183 of 210

Tasks affect processing

183

  • Stimuli: sentences
  • Task: searching for specific relations
  • Stimulus representation: word embeddings
  • Brain recording & modality: EEG, reading

Possible to predict whether a person is passively reading or performing a task with the text based on EEG recordings

IJCAI 2023: DL for Brain Encoding and Decoding

184 of 210

Tasks affect processing

184

  • Stimuli: images of natural scenes
  • Stimulus representation: task-optimized CNNs for a range of tasks
  • Brain recording & modality: fMRI, vision

Semantic Low-dim. Geometric 2D 3D

Vision tasks with higher transferability make similar predictions for brain responses from different regions

IJCAI 2023: DL for Brain Encoding and Decoding

185 of 210

Tasks affect processing

185

  • Stimuli: passages and narratives
  • Stimulus representation: task-optimized NLP models for a range of tasks
  • Brain recording & modality: fMRI, reading & listening of different stimuli

Reading fMRI best explained by coref. resolution, NER, shallow syntax parsing

Listening fMRI best explained by paraphrasing, summarization, NLI

IJCAI 2023: DL for Brain Encoding and Decoding

186 of 210

Tasks affect processing

  • Stimuli: one chapter of Harry Potter
  • Stimulus representation: summarization-optimized language models
  • Brain recording & modality: fMRI, reading

brain alignment (Pearson correlation)

Model trained with�language modeling

Model trained to�summarize narratives

input

input

activations

activations

book�chapter

Training language models to summarize narratives improves brain alignment, especially during important narrative elements (Characters, emotions, etc.)

IJCAI 2023: DL for Brain Encoding and Decoding

187 of 210

Challenges in using DL for cognitive modeling

  • Not designed to specifically model brain processing
    • Training DL models using brain recordings
    • Task-based modeling
  • Can be difficult to interpret due to multiple sources of information
    • Disentangling contributions of different info sources to brain predictions

187

IJCAI 2023: DL for Brain Encoding and Decoding

188 of 210

Disentangling contributions of different info sources to brain predictions

188

“Mary finished the apple”

supra-word meaning may contain concept of:

  • eating
  • apple core

supra-word

meaning

Isolating supra-word meaning is a type of intervention

IJCAI 2023: DL for Brain Encoding and Decoding

189 of 210

Disentangling contributions of different info sources to brain predictions

189

full context

supra-word

Bilateral PTL and ATL process supra-word meaning

Word-level information important for prediction of most language regions

  • Stimuli: one chapter of Harry Potter
  • Stimulus representation: disentangled embeddings from pretrained NLP models
  • Brain recording & modality: fMRI & MEG, reading

IJCAI 2023: DL for Brain Encoding and Decoding

190 of 210

Disentangling contributions of different info sources to brain predictions

190

Figures provided by Shailee Jain

  • Stimuli: story
  • Stimulus representation: multi-timescale NLP model
  • Brain recording & modality: fMRI, listening

Utilizing an NLP model that explicitly represents different timescale of information allows the voxel-wise estimation of the preferred timescales

IJCAI 2023: DL for Brain Encoding and Decoding

191 of 210

Disentangling contributions of different info sources to brain predictions

191

Syntactic structure-based features explain additional variance in language regions over complexity metrics

Regions predicted by syntactic and semantic are difficult to distinguish

  • Stimuli: one chapter of Harry Potter
  • Stimulus representation: syntactic tree representations & pretrained NLP model
  • Brain recording & modality: fMRI, reading

IJCAI 2023: DL for Brain Encoding and Decoding

192 of 210

Disentangling contributions of different info sources to brain predictions

192

  • Stimuli: story
  • Stimulus representation: pretrained NLP models
  • Brain recording & modality: fMRI, listening

Compositional representations recruit a wider cortical network than word-level representations

Syntax and semantics not associated with separate modules

IJCAI 2023: DL for Brain Encoding and Decoding

193 of 210

Disentangling contributions of different info sources to brain predictions

193

  • Stimuli: story
  • Stimulus representation: pretrained NLP model
  • Brain recording & modality: fMRI, listening

Decomposing NLP embeddings into attention heads reveals correlations between syntactic computations and prediction of fMRI recordings

IJCAI 2023: DL for Brain Encoding and Decoding

194 of 210

Disentangling contributions of different info sources to brain predictions

  • Stimuli: story
  • Stimulus representation: pretrained NLP model
  • Brain recording & modality: fMRI, listening

fMRI

Naturalistic stimulus

This is Los Angeles. And it's the …

Language model

Linguistic property

Original brain alignment

Significant �difference ⇒Ling. prop. affects alignment

Residual

Residual brain alignment

Syntactic properties contribute the most to the brain alignment trend across layers of language models

IJCAI 2023: DL for Brain Encoding and Decoding

195 of 210

Complex stimulus representations make it difficult to infer the effect of a stimulus on multiple brain areas

195

“The problem is when the capsule moves from an elliptical orbit to a parabolic orbit.”

Variance in Brain area 1

Variance in Brain area 2

Variance in the stimulus

Variance in the stimulus representation

IJCAI 2023: DL for Brain Encoding and Decoding

196 of 210

Framework to determine whether a complex stimulus affects two brain areas in a similar way

196

IJCAI 2023: DL for Brain Encoding and Decoding

197 of 210

Framework reveals differences in processing across language network areas

197

Example of each type of effect in movie fMRI data

  • Stimuli: movie
  • Stimulus representation: pretrained NLP model
  • Brain recording & modality: fMRI, view & listen

Encoding model perf. significant in all language areas

Framework reveals differences in processing across language network areas

IJCAI 2023: DL for Brain Encoding and Decoding

198 of 210

Challenges in using DL for cognitive modeling

  • Not designed to specifically model brain processing
    • Training DL models using brain recordings
    • Task-based modeling
  • Can be difficult to interpret due to multiple sources of information
    • Disentangling contributions of different info sources to brain predictions

198

IJCAI 2023: DL for Brain Encoding and Decoding

199 of 210

Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding

Subba Reddy Oota1, Manish Gupta2,3, Raju S. Bapi2, Mariya Toneva4

1Inria Bordeaux, France; 2IIIT Hyderabad, India; 3Microsoft, India; 4MPI for Software Systems, Germany

subba-reddy.oota@inria.fr, gmanish@microsoft.com, raju.bapi@iiit.ac.in, mtoneva@mpi-sws.org

200 of 210

Agenda

  • Introduction to Brain encoding and decoding [30 min]
  • Stimulus Representations [1 hour]
  • Coffee break [30 min]
  • Deep Learning for Brain Decoding [1 hour 30 min]
  • Lunch break [1 hour 30 min]
  • Deep Learning for Brain Encoding [1 hour 30 min]
  • Coffee break [30 min]
  • Advanced Methods [1 hour 15 min]
  • Summary and Future Trends [15 min]

IJCAI 2023: DL for Brain Encoding and Decoding

200

201 of 210

Outline

  1. Summary
  2. Future trends

201

IJCAI 2023: DL for Brain Encoding and Decoding

202 of 210

Summary

  • Exciting times: publicly accessible neuroimaging data of various tasks starting to be avaliable now!
    • Opportunities:
      • Data ahead of theory, so it’s an open field for theoretical and methodological innovation!
      • Encoding models can be interpreted as process models constraining brain-computational theories (Kriegeskorte and Douglas, 2019).
      • Decoding models serve as a test for the presence of information in neural responses (Karamolegkou et al., 2023)
      • Decoding is relevant for cognitive neuroscientists interested in how semantic information is represented in the brain.
      • Computational linguists are interested in the cognitive plausibility of distributional models. (Minnema & Herbelot, ACL 2019)
      • DL is helpful in uncovering patterns in brain responses and may lead to theories of information organization in the brain.

    • Challenges:
      • Hypothesis-driven data collection might be more helpful
      • Individual variability is the norm in neuroimaging data!
      • Neuroimaging data is more complex, noisy as compared to classical datasets used by DL researchers

202

IJCAI 2023: DL for Brain Encoding and Decoding

203 of 210

Summary

  • This Tutorial:
  • Stimulus representation schemes
    • Vision: CNN-based
    • Language: Transformer-based
  • Datasets available (Reading/Listening/Viewing tasks in EEG, MEG, fMRI)
  • Decoding
    • Word-level Universal Brain Decoder; Continuous Lang Decoding; Multi-view and Cross-view Decoding
  • Encoding
    • Classical findings; More recent DL-based models
  • Advance methods
    • Tuning/Training DL models using brain recordings
    • Task-based modeling

203

IJCAI 2023: DL for Brain Encoding and Decoding

204 of 210

Outline

  1. Summary
  2. Future trends: DNNs & The Brain

204

IJCAI 2023: DL for Brain Encoding and Decoding

205 of 210

DNNs & The Brain: Multi-modal, Multi-task

  • Brain response to a stimulus is multi-modal, multi-task related
    • Cross-view and multi-view decoding (Oota et al 2022a)
    • Visio-linguistic encoding (fusion of vision and language information) (Oota et al 2022b)
    • Task-based representations give better brain alignment (Neural Taskonomy: Oota et al 2022c)
    • Multimodal foundation model (Fei et al 2022)

Fei, Lu, Gao et al (2022). Towards artificial general intelligence via a multimodal foundation model. Nature Communications 13:3094

doi.org/10.1038/s41467-022-30761-2

206 of 210

DNNs & Brain Damage

  • DL models of encoding and decoding have not yet been put through the brain-damage experiments. Ex. Semantic Dementia

Snowden, Harris, Thompson, Kobylecki, Jones, Richardson, Neary (2018). Semantic dementia and the left and right temporal lobes, Cortex, 107(188-203).

https://doi.org/10.1016/j.cortex.2017.08.024.

Rt Ant Temporal Lobe Damage (Patient 8)

Animal habitat task.

The patient is asked:

Where would you find this?

Do DL Models exhibit such degradation with damage to units?

207 of 210

  • How do multilingual participants represent information?
    • Different language families and typologies (verb-framed vs satellite
    • Multiple scripts
  • How do brain activations align to modern LLMs that perform language translation among multiple languages apparently seamlessly?
  • Bi/Multilingual Advantage and what does it mean for DL models?
    • studies have shown superior executive function (inhibitory control), memory in multilingual participants
    • Potential representational differences in simultaneous and sequential multilinguals
  • Link between Language and Cognition
  • What can DL models contribute to Bi/Multilingual Literature?

207

Multilinguality

IJCAI 2023: DL for Brain Encoding and Decoding

208 of 210

DNNs & Brain: Multi-modal, Multi-task

  • Brain response to a stimulus is multi-modal, multi-task related
    • Cross-view and multi-view decoding (Oota et al 2022)
    • Visio-linguistic encoding (fusion of vision and language information) (Oota et al 2022)
    • Multimodal foundation model (Fei et al 2022)

Fei, Lu, Gao et al (2022). Towards artificial general intelligence via a multimodal foundation model. Nature Communications 13:3094 doi.org/10.1038/s41467-022-30761-2

209 of 210

A big thank you!

Tutorial, Code and Material:

Material from IJCAI 2023 Tutorial would be uploaded soon!

(Past): Deep Learning for Brain Encoding and Decoding, Cogsci-2022

https://tinyurl.com/DL4Brain

(Past): Language and the Brain: Deep Learning for Brain Encoding and Decoding, IJCNN 2023

https://tinyurl.com/DLBrainIJCNN2023

210 of 210

Thanks!

IJCAI 2023: DL for Brain Encoding and Decoding

210