Two-level bimodal association for audio-visual speech recognition

Jong Seok Lee, Touradj Ebrahimi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)


This paper proposes a new method for bimodal information fusion in audio-visual speech recognition, where cross-modal association is considered in two levels. First, the acoustic and the visual data streams are combined at the feature level by using the canonical correlation analysis, which deals with the problems of audio-visual synchronization and utilizing the cross-modal correlation. Second, information streams are integrated at the decision level for adaptive fusion of the streams according to the noise condition of the given speech datum. Experimental results demonstrate that the proposed method is effective for producing noise-robust recognition performance without a priori knowledge about the noise conditions of the speech data.

Original languageEnglish
Title of host publicationAdvanced Concepts for Intelligent Vision Systems - 11th International Conference, ACIVS 2009, Proceedings
Number of pages12
Publication statusPublished - 2009
Event11th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009 - Bordeaux, France
Duration: 2009 Sept 282009 Oct 2

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5807 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other11th International Conference on Advanced Concepts for Intelligent Vision Systems, ACIVS 2009

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Two-level bimodal association for audio-visual speech recognition'. Together they form a unique fingerprint.

Cite this