A study on speech disentanglement framework based on adversarial learning for speaker recognition

Yoohwan Kwon, Soo Whan Chung, Hong Goo Kang

Research output: Contribution to journalArticlepeer-review


In this paper, we propose a system to extract effective speaker representations from a speech signal using a deep learning method. Based on the fact that speech signal contains identity unrelated information such as text content, emotion, background noise, and so on, we perform a training such that the extracted features only represent speaker-related information but do not represent speaker-unrelated information. Specifically, we propose an auto-encoder based disentanglement method that outputs both speaker-related and speaker-unrelated embeddings using effective loss functions. To further improve the reconstruction performance in the decoding process, we also introduce a discriminator popularly used in Generative Adversarial Network (GAN) structure. Since improving the decoding capability is helpful for preserving speaker information and disentanglement, it results in the improvement of speaker verification performance. Experimental results demonstrate the effectiveness of our proposed method by improving Equal Error Rate (EER) on benchmark dataset, Voxceleb1.

Original languageEnglish
Pages (from-to)447-453
Number of pages7
JournalJournal of the Acoustical Society of Korea
Issue number5
Publication statusPublished - 2020

Bibliographical note

Publisher Copyright:
Copyright © 2020 The Acoustical Society of Korea.

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Instrumentation
  • Acoustics and Ultrasonics
  • Applied Mathematics
  • Speech and Hearing


Dive into the research topics of 'A study on speech disentanglement framework based on adversarial learning for speaker recognition'. Together they form a unique fingerprint.

Cite this