TY - GEN
T1 - A pitch-synchronous speech analysis and synthesis method for DNN-SPSS system
AU - Kim, Jin Seob
AU - Joo, Young Sun
AU - Kang, Hong Goo
AU - Jang, Inseon
AU - Ahn, Chunghyun
AU - Seo, Jeongil
PY - 2016/7/2
Y1 - 2016/7/2
N2 - This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.
AB - This paper proposes a pitch-synchronous deep neural network (DNN)-based statistical parametric speech synthesis (SPSS) system. The pitch-synchronous frames defined by the locations of glottal closure instants (GCIs) are used to extract speech parameters, which significantly reduce coupling effects between vocal tract and excitation signals. As a result, the distribution of spectral parameters within the same context of phonetic classes becomes more uniform, which improves a model trainability especially for a small-scaled DNN framework. Although the effectiveness of pitch-synchronous approach has been proven in other applications, it is not trivial to integrate the method into the typical DNN-based SPSS systems that have regularized structures, i.e. fixed frame rate and fixed dimension of features. In this paper, we design a new DNN-based SPSS system that pitch-synchronously trains and generates speech parameters. Objective and subjective test results verify the superiority of the proposed system compared to the conventional approach.
UR - http://www.scopus.com/inward/record.url?scp=85016224967&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85016224967&partnerID=8YFLogxK
U2 - 10.1109/ICDSP.2016.7868589
DO - 10.1109/ICDSP.2016.7868589
M3 - Conference contribution
T3 - International Conference on Digital Signal Processing, DSP
SP - 408
EP - 411
BT - Proceedings - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Conference on Digital Signal Processing, DSP 2016
Y2 - 16 October 2016 through 18 October 2016
ER -