Abstract
In this paper, we propose ExcitGlow, a vocoder that incorporates the source-filter model of voice production theory into a flow-based deep generative model. By targeting the distribution of the excitation signal instead of the speech waveform itself, we significantly reduce the size of the flow-based generative model. To further reduce the number of parameters, we apply a parameter sharing technique in which a single affine coupling layer is used for several flow layers. To avoid quality degradation, we also introduce a closed-loop training framework to optimize the flow model for both the speech and excitation signal generation processes. Specifically, we choose negative log-likelihood (NLL) loss for the excitation signal and multi-resolution spectral distance for the speech signal. As a result, we are able to reduce the model size from 87. 73M to 15. 60M parameters while maintaining the perceptual quality of synthesized speech.
Original language | English |
---|---|
Title of host publication | 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 831-836 |
Number of pages | 6 |
ISBN (Electronic) | 9789881476883 |
Publication status | Published - 2020 Dec 7 |
Event | 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Virtual, Auckland, New Zealand Duration: 2020 Dec 7 → 2020 Dec 10 |
Publication series
Name | 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings |
---|
Conference
Conference | 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 |
---|---|
Country/Territory | New Zealand |
City | Virtual, Auckland |
Period | 20/12/7 → 20/12/10 |
Bibliographical note
Funding Information:VII. ACKNOWLEDGEMENTS The work was supported by Clova Voice, NAVER Corp., Seongnam, Korea.
Publisher Copyright:
© 2020 APSIPA.
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Networks and Communications
- Computer Vision and Pattern Recognition
- Hardware and Architecture
- Signal Processing
- Decision Sciences (miscellaneous)
- Instrumentation