1 of 12

AI model for Speech Annotation

Beijie Liu, Rodrigo Eguiluz Ortiz Duran

Mentor: Prof. Emily Mower Provost

2 of 12

How AI model helps?

So you can (revision) you can change them if you want

  • Speech and language changes are common in health conditions like Parkinson’s, Huntington’s, and Alzheimer’s Diseases. Some audio features, like pauses and use of words, can serve as indicators of these conditions.
  • Clinicians need to transcribe audio for better medical records and better patient care
  • This process is manual and time-consuming
  • The inability to scale this process efficiently limits the scope of investigations

3 of 12

Workflow of the project

Collaborate with

Clinicians

Identify key features

Implement the feature

Get feedback

4 of 12

Some audio features

So you can (revision) you can change them if you want

Audio features are measurable properties of sound that capture various aspects of audio signals

5 of 12

Table1: Tetzloff, K. A., Utianski, R. L., Duffy, J. R., Clark, H. M., Strand, E. A., Josephs, K. A., & Whitwell, J. L. (2018). Quantitative analysis of agrammatism in agrammatic primary progressive aphasia and dominant apraxia of speech. Journal of Speech, Language, and Hearing Research61(9), 2337-2346.

Table2: Catricalà, E., Boschi, V., Cuoco, S., Galiano, F., Picillo, M., Gobbi, E., ... & Cappa, S. F. (2019). The language profile of progressive supranuclear palsy. Cortex, 115, 294-308.

6 of 12

Web Application

  • Clear Upload Window
  • Clinicians approved
  • Searching Engine for features
  • Exportable Excel Output

7 of 12

Three AI Models

  • Automatic Speech Recognition (ASR)
    • Transcription model that converts speech to text.
  • Text to Features
    • Model for understanding and annotating the context of the text.
  • Audio to Features
    • Model for processing raw audio data for improved transcription accuracy.

Audio Analysis

  1. Romana, A., Koishida, K., & Provost, E. M. (2023). Automatic Disfluency Detection from Untranscribed Speech. arXiv preprint arXiv:2311.00867.

Speech to Text

Text Analysis

8 of 12

Technical Side

Speech to Text

Text Analysis

So you can you can change them if you want

RV: you can

Some Sentence-Based Features:

  1. Word Count: 10 words
  2. Speech Rate: 197.37 words/min
  3. Sentence segmentation: divide into sentences
  4. Detect Clauses: “if you want”

visualization of one feature: Revision (RV)

9 of 12

Future Work

  • Work with clinicians to create new features
  • Make the AI tool more useable
  • Beautify the front-end and make it accessible
  • Goal: apply to clinical data

10 of 12

Reference

  • Tetzloff, K. A., Utianski, R. L., Duffy, J. R., Clark, H. M., Strand, E. A., Josephs, K. A., & Whitwell, J. L. (2018). Quantitative analysis of agrammatism in agrammatic primary progressive aphasia and dominant apraxia of speech. Journal of Speech, Language, and Hearing Research61(9), 2337-2346.
  • Catricalà, E., Boschi, V., Cuoco, S., Galiano, F., Picillo, M., Gobbi, E., ... & Cappa, S. F. (2019). The language profile of progressive supranuclear palsy. Cortex, 115, 294-308.
  • Romana, A., Koishida, K., & Provost, E. M. (2023). Automatic Disfluency Detection from Untranscribed Speech. arXiv preprint arXiv:2311.00867.

11 of 12

Acknowledgement

Thanks Dr. Emily Mower Provost for her invaluable guidance and support.

Thanks everyone in the CHAI Lab for their collaboration and encouragement throughout this project.

12 of 12

Technical Side

  • Flexibility to process features

Whisper

NLTK

Spacy

Webrtcvad: make segmentations for sentences

Bird, Steven, Edward Loper and Ewan Klein (2009). Natural Language Processing with Python. O'Reilly Media Inc. URL: https://www.nltk.org/

Van Rossum, A. (2023). Natural Language Processing With spaCy in Python. Real Python. Retrieved from https://realpython.com/natural-language-processing-spacy-python/