A supervised learning method for improving the generalization of speaker verification systems by learning metrics from a mean teacher

Ju Ho Kim, Hye Jin Shim, Jee Weon Jung, Ha Jin Yu

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

The majority of recent speaker verification tasks are studied under open-set evaluation scenarios considering real-world conditions. The characteristics of these tasks imply that the generalization towards unseen speakers is a critical capability. Thus, this study aims to improve the generalization of the system for the performance enhancement of speaker verification. To achieve this goal, we propose a novel supervised-learning-method-based speaker verification system using the mean teacher framework. The mean teacher network refers to the temporal averaging of deep neural network parameters, which can produce a more accurate, stable representations than fixed weights at the end of training and is conventionally used for semi-supervised learning. Leveraging the success of the mean teacher framework in many studies, the proposed supervised learning method exploits the mean teacher network as an auxiliary model for better training of the main model, the student network. By learning the reliable intermediate representations derived from the mean teacher network as well as one-hot speaker labels, the student network is encouraged to explore more discriminative embedding spaces. The experimental results demonstrate that the proposed method relatively reduces the equal error rate by 11.61%, compared to the baseline system.

Original languageEnglish
Article number76
JournalApplied Sciences (Switzerland)
Volume12
Issue number1
DOIs
StatePublished - 1 Jan 2022

Keywords

  • Mean teacher
  • Metric learning
  • Speaker verification
  • Supervised learning

Fingerprint

Dive into the research topics of 'A supervised learning method for improving the generalization of speaker verification systems by learning metrics from a mean teacher'. Together they form a unique fingerprint.

Cite this