1 of 6

Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity?

1

Subba Reddy Oota1,2

Veeral Agarwal2

Mounika Marreddy2

Manish Gupta2,3

Bapi Raju Surampudi2

1Inria Bordeaux, France 2IIIT-Hyderabad, India 3Microsoft, India

2 of 6

What is brain encoding?

Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? Subba Reddy Oota, Veeral Agarwal, Mounika Marreddy, Manish Gupta, Raju Bapi. InterSpeech 2023

2

Encode

Speech

Deep

Learning

Models

Predict

3 of 6

Speech tasks from SUPERB benchmark.

  • Phoneme Recognition (PR): utterance to phoneme.
  • Automatic Speech Recognition (ASR): utterances to words.
  • Keyword Spotting (KS): detect preregistered keywords in utterances.
  • Intent Classification (IC): utterances to intent classes.
  • Speaker Diarization (SD): utterance to speaker identity.
  • Speaker Verification (SV): verify speakers are same or different
  • Speaker Identification (SID): utterance to speaker.
  • Emotion Recognition (ER): utterance to emotion.

  • Cognitive speech perception skills: recognition (PR and ASR), detection (KS), semantics (IC, SF, and ST), speaker-related (SV, SD, and SID), and paralinguistics (ER).

Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? Subba Reddy Oota, Veeral Agarwal, Mounika Marreddy, Manish Gupta, Raju Bapi. InterSpeech 2023

3

4 of 6

Experiments and Results

  • Dataset: 82 subjects listening to Pieman story (957 words) from Narratives collection.
  • Encoded speech stimuli using finetuned Wav2Vec2.0 models on 8 tasks.
  • Predicted brain response using banded ridge regression.
  • Evaluation using Pearson correlation between predictions and actual brain activations.
  • ASR, ER, SID and IC are better aligned compared to pretrained Wav2Vec2.0.

Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? Subba Reddy Oota, Veeral Agarwal, Mounika Marreddy, Manish Gupta, Raju Bapi. InterSpeech 2023

4

ASR best encodes speech stimuli for brain response prediction.

ASR task has the best brain alignment in the middle layers.

5 of 6

Region and Sub-region level alignments

  • ROI results
    • Best alignment for EAC (early auditory cortex).
    • ER, SID and IC are best for EAC
    • ASR is best for auditory associative cortex and language regions.

Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? Subba Reddy Oota, Veeral Agarwal, Mounika Marreddy, Manish Gupta, Raju Bapi. InterSpeech 2023

  • Sub-ROI results
    • EAC: A1 has a higher Pearson correlation than other sub-ROIs.
    • ASR is best for language network associated with narrative comprehension. (Language ROIs 44 and 45, STSda and STSdp in AAC)

5

6 of 6

Conclusion

  • Evaluated brain encoding perf of finetuned representations of 8 speech tasks.
  • ASR, PR, ER, SID and IC models > pretrained Wav2Vec2.0
  • Layer-wise correlations, and analyses at ROI and sub-ROI levels
    • ASR is best aligned overall.
    • ER, SID and IC leads to the best alignment for the early auditory cortex.

  • Paper: https://hal.science/hal-04131475

Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity? Subba Reddy Oota, Veeral Agarwal, Mounika Marreddy, Manish Gupta, Raju Bapi. InterSpeech 2023

6