Rethinking Device Interaction: A Silent Speech Approach
Tanmay Srivastava†, Prerna Khanna†, Shijia Pan★, VP Nguyen‡, Shubham Jain†
†
★
‡
1
Speech interactions are UBIQUITOUS…
Oh no, the mutant
chicken is here!
Unreliable in noisy
environment
Don’t allow discreet
communication
Not catered for next-gen wearables
Speech interactions are NOT ALWAYS PRACTICAL…
2
Contactless USI Systems
Contact USI Systems
Other Input Modalities
(MobiCom’20)
(IUI’18)
(CHI’22)
(IMWUT’20)
(MobiCom’14)
(CHI’19)
(Mobisys’18)
The Search for Silent Alternatives
3
Using Silent Speech as a surrogate to speech
Contact SSI
Acceptable Form Factor
Unobtrusive
Hands-free
Jaw Motion
4
Analogy: Reconstruct the song by
watching the guitarist's hand
Jaw Motion
Speech
Guess the song!
Blinding Lights
5
Is it even possible to infer silent speech from JAW?
Accelerometer
Tempo mandibular
Joint
6
Are the signals detectable?
7
Are the signals recognizable?
JawSense (HotMobile’21)
INITIAL BREAKTHROUGH
8
Let’s recognize words…
9
/stɑːrt/
Isolated /ɑː/
/ɑː/ (Nucleus) -> Jaw Downwards
/s/ (Onset) ->
Jaw Upwards
/t/ (Coda) Plosive
-
-
Breaking words into phonemes
10
Accuracy across different syllabic length words >0.9
How well can we recognize isolated words?
11
Accuracy across different syllabic length words >0.9
How well can we recognize isolated words?
MuteIt (UbiComp’22)
WE MADE IT PRACTICAL!!
12
Moving to natural silent conversation.. .
~Typing Speed
~Normal Speech Rate
13
Jaw Motion
6-axis IMU
Unvoiced
Spectrogram
Spectrogram to Text
Using Silent Speech as a surrogate to speech
14
SenSys’24
How do we get the spectrogram from IMU?
15
How do we learn this information?
Set alarm for 6 AM
Set alarm for 6 AM
High MSE
[a1, 0, a2, 0, a3,……an]
[a1, 0, 0, 0, a2, 0, 0, 0,………an]
Designing Loss Function
Prosody Loss
MSE
Prosody
16
We achieve > 94% accuracy
How well can we perform interaction tasks?
17
Add
Apples
Banana
To
Shopping
List
Add apples and bananas to shopping list.
Delete apples and bananas from shopping list.
1 syllable,
low vowel
1 syllable,
high vowel
&
Putting last 3.5 years together
JawSense
Unvoiced
MuteIt
THE FINAL SOLUTION….
18
Work done at MSR
What are the next steps?
1. Silent Speech on Commercial Wearables
Headphones
VR headset
AR headset
Earphones
19
What are the next steps?
2. Working with the Accessible Population
x10
Afternoon
20
Closing remarks
22
SenSys’24
Quality of spectrogram generation