TY - GEN
T1 - Temporal filtering of visual speech for audio-visual speech recognition in acoustically and visually challenging environments
AU - Lee, Jong Seok
AU - Park, Cheol Hoon
PY - 2007
Y1 - 2007
N2 - The use of visual information of speech has been shown to be effective for compensating for performance degradation of acoustic speech recognition in noisy environments. However, visual noise is usually ignored in most of audio-visual speech recognition systems, while it can be included in visual speech signals during acquisition or transmission of the signals. In this paper, we present a new temporal filtering technique for extraction of noise-robust visual features. In the proposed method, a carefully designed band-pass filter is applied to the temporal pixel value sequences of lip region images in order to remove unwanted temporal variations due to visual noise, illumination conditions or speakers' appearances. We demonstrate that the method can improve not only visual speech recognition performance for clean and noisy images but also audio-visual speech recognition performance in both acoustically and visually noisy conditions.
AB - The use of visual information of speech has been shown to be effective for compensating for performance degradation of acoustic speech recognition in noisy environments. However, visual noise is usually ignored in most of audio-visual speech recognition systems, while it can be included in visual speech signals during acquisition or transmission of the signals. In this paper, we present a new temporal filtering technique for extraction of noise-robust visual features. In the proposed method, a carefully designed band-pass filter is applied to the temporal pixel value sequences of lip region images in order to remove unwanted temporal variations due to visual noise, illumination conditions or speakers' appearances. We demonstrate that the method can improve not only visual speech recognition performance for clean and noisy images but also audio-visual speech recognition performance in both acoustically and visually noisy conditions.
UR - http://www.scopus.com/inward/record.url?scp=57649210317&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=57649210317&partnerID=8YFLogxK
U2 - 10.1145/1322192.1322231
DO - 10.1145/1322192.1322231
M3 - Conference contribution
AN - SCOPUS:57649210317
SN - 9781595938176
T3 - Proceedings of the 9th International Conference on Multimodal Interfaces, ICMI'07
SP - 220
EP - 227
BT - Proceedings of the 9th International Conference on Multimodal Interfaces, ICMI'07
T2 - 9th International Conference on Multimodal Interfaces, ICMI 2007
Y2 - 12 November 2007 through 15 November 2007
ER -