FACTORIAL HIDDEN �RESTRICTED BOLTZMANN MACHINES �FOR �NOISE ROBUST SPEECH RECOGNITION
Steven J. Rennie
Petr Fousek, and Pierre L. Dognin
October 24, 2012
IBM T. J. Watson Research Center
© 2011 IBM Corporation
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Motivation
2
Some Applications
mobile computing
surveillance
signal re-composition/editing
acoustic forensics
robust audio search
artificial perception
enhanced hearing
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Why is Robust ASR hard?�
3
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Factorial Models of Noisy Speech
4
BIN WHITE BY Z 8 AGAIN
BIN GREEN WITH A 2 SOON
SET GREEN IN F 2 NOW
LAY RED WITH C 1 PLEASE
BIN WHITE BY Z 8 AGAIN
SET GREEN IN F 2 NOW
x dB
n dB
Traffic Noise
Engine Noise
Speech Babble
Airport Noise
Car Noise
Music
Speech
Speech
Speech
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Combinatoric Considerations
5
p
(
s
n
)
p
(
x
n
j
s
n
)
s
n
x
n
s
N
x
N
p
(
x
N
j
s
N
)
p
(
s
N
)
p
(
x
1
j
s
1
)
p
(
s
1
)
s
1
x
1
p
(
s
n
)
p
(
x
n
j
s
n
)
s
n
x
n
O
(
Y
n
j
s
n
j
)
I
n
f
e
r
e
n
c
e
:
y
p
(
y
j
x
1
;
¢
¢
¢
;
x
N
)
I
n
t
e
r
a
c
t
i
o
n
m
o
d
e
l
S
o
u
r
c
e
M
o
d
e
l
s
:
-
f
e
a
t
u
r
e
s
x
n
-
s
t
a
t
e
s
s
n
-
n
u
m
b
e
r
o
f
s
t
a
t
e
s
j
s
n
j
- functions of
connected variables
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
4 sources…
6
PLACE GREEN WITH B 8 SOON
LAY BLUE AT P ZERO NOW
PLACE RED IN H 3 NOW
PLACE WHITE AT D ZERO SOON
PLACE GREEN WITH B 8 SOON
LAY BLUE AT P ZERO NOW
PLACE RED IN H 3 NOW
PLACE WHITE AT D ZERO SOON
0 dB
-7 dB
-7 dB
-7 dB
Rennie, S., Hershey, J., Olsen, P., “Single Channel Multi-talker Speech Recognition: Graphical Modeling Approaches”. IEEE Signal Processing Magazine, Special Issue on Graphical Models, Vol. 27:6, November 2010.
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Motivation
7
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Review: Restricted Boltzmann Machines
8
l
o
g
p
(
v
;
h
)
=
¡
V
X
i
=
1
(
v
i
¡
b
i
)
2
2
¾
2
i
+
H
X
j
=
1
a
j
h
j
+
V
X
i
=
1
H
X
j
=
1
!
i
j
v
i
h
j
¡
Z
p
(
h
j
=
1
j
v
)
=
e
x
p
(
a
j
+
P
V
i
=
1
!
i
j
v
i
)
1
+
e
x
p
(
a
j
+
P
V
i
=
1
!
i
j
v
i
)
=
s
i
g
(
a
j
+
V
X
i
=
1
!
i
j
v
i
)
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Review: Restricted Boltzmann Machines (cont’d)
9
p
(
v
i
j
h
)
=
e
x
p
(
¡
(
v
i
¡
b
i
)
2
2
¾
2
i
+
P
H
j
=
1
!
i
j
v
i
h
j
)
R
v
i
e
x
p
(
¡
(
v
i
¡
b
i
)
2
2
¾
2
i
+
P
H
j
=
1
!
i
j
v
i
h
j
)
=
N
(
v
i
;
b
i
+
¾
2
i
H
X
j
=
1
!
i
j
h
j
;
¾
2
i
)
;
2
H
H
p
(
h
j
v
)
=
Q
i
p
(
h
j
j
v
)
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Factorial Hidden Restricted Boltzmann Machines
10
p
(
y
j
v
x
;
v
n
)
l
o
g
p
(
y
)
=
l
o
g
X
h
;
v
p
(
h
x
;
v
x
)
p
(
h
n
;
v
n
)
p
(
y
j
v
x
;
v
n
)
¸
X
h
;
v
q
(
h
;
v
)
l
o
g
p
(
h
x
;
v
x
)
p
(
h
n
;
v
n
)
p
(
y
j
v
)
q
(
h
;
v
)
=
E
q
(
v
x
;
v
n
)
[
l
o
g
p
(
y
j
v
)
]
+
X
i
=
x
;
n
E
q
(
h
i
;
v
i
)
[
l
o
g
p
(
h
i
;
v
i
)
q
(
h
i
;
v
i
)
]
´
L
Rennie, S., Fousek, P., Dognin, P. “Factorial Hidden Restricted Boltzmann Machines for Robust Speech Recognition”, ICASSP 2012.
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
FHRBM Model : Factor Graph
11
…
…
…
ª
(
v
x
;
h
x
)
v
x
h
x
l
x
ª
(
h
x
;
l
x
)
…
…
…
l
n
h
n
v
n
ª
(
v
n
;
h
n
)
ª
(
h
n
;
l
n
)
…
p
(
y
j
v
x
;
v
n
)
Speech Model
Noise Model
Interaction Model
y
Noisy
Data
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
FHRBMs for Robust ASR
12
p
(
v
x
;
h
x
)
p
(
v
n
;
h
n
)
p
(
y
j
v
x
;
v
n
)
=
Y
f
N
(
y
f
;
g
(
v
f
)
;
Ã
2
f
)
;
g
(
v
f
)
=
l
o
g
(
e
x
p
(
v
x
f
)
+
e
x
p
(
v
n
f
)
)
v
f
=
[
v
x
f
v
n
f
]
T
=
Y
f
N
(
v
f
;
¹
f
;
©
f
)
Y
s
=
x
;
n
H
s
Y
j
=
1
(
°
h
s
j
)
h
s
j
(
1
¡
°
h
s
j
)
1
¡
h
s
j
q
(
h
x
;
v
x
;
h
n
;
v
n
)
=
Y
f
q
(
v
x
f
;
v
n
f
)
H
x
Y
j
=
1
q
(
h
x
j
)
H
n
Y
k
=
1
q
(
h
n
k
)
[ this choice ignores
phase interactions ]
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
FHRBMs for Robust ASR
13
p
(
y
j
v
x
;
v
n
)
¼
Y
f
N
(
y
f
;
g
(
¹
f
)
+
(
v
f
¡
¹
f
)
T
d
f
;
Ã
2
f
)
;
d
v
n
f
=
1
¡
d
v
x
f
d
v
x
f
=
s
i
g
(
¹
v
n
f
¡
¹
v
x
f
)
d
f
=
[
d
v
x
f
d
v
n
f
]
T
=
@
g
@
v
f
¯
¯
¯
v
f
=
¹
f
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
14
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
FHRBMs for Robust ASR
15
°
h
s
j
=
s
i
g
(
a
s
j
+
P
V
s
f
=
1
!
s
f
j
¹
v
s
f
)
Á
2
v
s
f
=
(
¾
¡
2
v
s
f
+
d
2
v
s
f
(
Ã
0
f
)
¡
2
)
¡
1
¹
v
s
f
=
Á
2
v
s
f
(
¾
¡
2
v
s
f
(
b
v
s
f
+
¾
2
v
s
f
P
H
s
j
=
1
!
s
f
j
°
h
s
j
)
+
d
v
s
f
(
Ã
0
f
)
¡
2
y
0
f
)
Influence of other
source’s network
and data
Influence of source’s network
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Deep FHRBMs for Robust ASR
16
l
s
=
f
l
s
1
;
l
s
k
;
:
:
:
;
l
s
L
s
g
q
(
l
s
)
=
Q
k
q
(
l
s
k
)
=
Q
k
°
l
s
k
°
h
s
j
=
s
i
g
(
a
s
j
+
P
V
s
i
=
1
!
s
i
j
¹
v
s
i
+
®
s
j
+
P
L
s
j
=
1
$
s
j
k
°
l
s
k
)
Influence of layer above
Influence of layer below
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Speech RBM
17
Middle Layer (8 Units)
Middle Layer (32 Units)
Top Layer (8 Units)
Features (24 dim)
Top Layer (3 Units)
Features (24 dim)
Noise RBM
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Model-based Noise Compensation Algorithms
18
Noisy Speech
Model
Noise Model
(NM)
Speech Model
(SM)
Interaction Model
(IM)
Algorithm | SM | NM | IM |
DNA (CD) | GMM | Gaussian Process | Log-sum (VTS approx) |
FHRBM | RBM | RBM | Log-sum (VTS approx) |
SNM | GMM | Fixed Gaussian (first 10 frames) | Log-sum (VTS approx) |
GMM-GMM | GMM | GMM | Log-sum (VTS approx) |
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Preliminary Results
19
j
µ
R
B
M
x
j
=
j
µ
G
M
M
x
j
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
1
2
3
1
2
3
2
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Preliminary Results – WER vs. (biased) SNR
20
fMLLR off, SS off
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Preliminary Results – WER vs. (biased) SNR
21
fMLLR on, SS on
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation
Discussion
22
Factorial Hidden RBMs for Noise Robust Speech Recognition
© 2012 IBM Corporation