TY - GEN
T1 - Factored adaptation of speaker and environment using orthogonal subspace transforms
AU - Seo, Hyunson
AU - Kang, Hong Goo
AU - Seltzer, Michael L.
PY - 2014
Y1 - 2014
N2 - This paper presents a subspace-based acoustic factorization framework to transform-based adaptation in speech recognition. In the proposed method, adaptation transforms are projected onto factor-dependent low-rank subspaces in a way that decouples the combined extrinsic factors affecting the speech signals. Usually, mismatch between the observed speech and the acoustic models is caused by multiple acoustic factors simultaneously, such as the speaker and environment. Data-driven adaptation methods, such as constrained MLLR, compensate for all sources of mismatch jointly. In many scenarios, however, it is highly desirable to separate the sources of mismatch in order to adapt to speaker and environment variability independently. This adds flexibility to the model adaptation framework. For example, a speaker transform obtained in one environment can be reused when the same speaker is in different environments. Or, an environment transform obtained during training, independently of speaker identities, can be applied to a speaker in deployment. One way to achieve this factorization is to construct each set of transforms such that they are orthogonal to each other, so that any change in one acoustic factor keeps other factors intact. The proposed subspace approach provides a straightforward factor analysis framework while allows us to explicitly formulate the independence among the estimated factor transforms. A series of experiments performed on the Aurora 4 corpus validates our approach.
AB - This paper presents a subspace-based acoustic factorization framework to transform-based adaptation in speech recognition. In the proposed method, adaptation transforms are projected onto factor-dependent low-rank subspaces in a way that decouples the combined extrinsic factors affecting the speech signals. Usually, mismatch between the observed speech and the acoustic models is caused by multiple acoustic factors simultaneously, such as the speaker and environment. Data-driven adaptation methods, such as constrained MLLR, compensate for all sources of mismatch jointly. In many scenarios, however, it is highly desirable to separate the sources of mismatch in order to adapt to speaker and environment variability independently. This adds flexibility to the model adaptation framework. For example, a speaker transform obtained in one environment can be reused when the same speaker is in different environments. Or, an environment transform obtained during training, independently of speaker identities, can be applied to a speaker in deployment. One way to achieve this factorization is to construct each set of transforms such that they are orthogonal to each other, so that any change in one acoustic factor keeps other factors intact. The proposed subspace approach provides a straightforward factor analysis framework while allows us to explicitly formulate the independence among the estimated factor transforms. A series of experiments performed on the Aurora 4 corpus validates our approach.
UR - http://www.scopus.com/inward/record.url?scp=84905280462&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84905280462&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2014.6854201
DO - 10.1109/ICASSP.2014.6854201
M3 - Conference contribution
AN - SCOPUS:84905280462
SN - 9781479928927
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3251
EP - 3255
BT - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Y2 - 4 May 2014 through 9 May 2014
ER -