ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis

Suhyeon Oh, Hyungseob Lim, Kyungguen Byun, Min Jae Hwang, Eunwoo Song, Hong Goo Kang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, we propose ExcitGlow, a vocoder that incorporates the source-filter model of voice production theory into a flow-based deep generative model. By targeting the distribution of the excitation signal instead of the speech waveform itself, we significantly reduce the size of the flow-based generative model. To further reduce the number of parameters, we apply a parameter sharing technique in which a single affine coupling layer is used for several flow layers. To avoid quality degradation, we also introduce a closed-loop training framework to optimize the flow model for both the speech and excitation signal generation processes. Specifically, we choose negative log-likelihood (NLL) loss for the excitation signal and multi-resolution spectral distance for the speech signal. As a result, we are able to reduce the model size from 87. 73M to 15. 60M parameters while maintaining the perceptual quality of synthesized speech.

Original languageEnglish
Title of host publication2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages831-836
Number of pages6
ISBN (Electronic)9789881476883
Publication statusPublished - 2020 Dec 7
Event2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Virtual, Auckland, New Zealand
Duration: 2020 Dec 72020 Dec 10

Publication series

Name2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020 - Proceedings

Conference

Conference2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
Country/TerritoryNew Zealand
CityVirtual, Auckland
Period20/12/720/12/10

Bibliographical note

Funding Information:
VII. ACKNOWLEDGEMENTS The work was supported by Clova Voice, NAVER Corp., Seongnam, Korea.

Publisher Copyright:
© 2020 APSIPA.

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Signal Processing
  • Decision Sciences (miscellaneous)
  • Instrumentation

Fingerprint

Dive into the research topics of 'ExcitGlow: Improving a WaveGlow-based Neural Vocoder with Linear Prediction Analysis'. Together they form a unique fingerprint.

Cite this