Improving Noise Robustness in Self-supervised Pre-trained Model for Speaker Verification

  • Chan Yeong Lim
  • , Hyun Seo Shin
  • , Ju Ho Kim
  • , Jungwoo Heo
  • , Kyo Won Koo
  • , Seung Bin Kim
  • , Ha Jin Yu

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Adopting self-supervised pre-trained models (PMs) in speaker verification (SV) has shown remarkable performance, but their noise robustness is largely unexplored. In the field of automatic speech recognition, additional training strategies enhance the robustness of the models before fine-tuning to improve performance in noisy environments. However, directly applying these strategies to SV risks distorting speaker information. We propose a noise adaptive warm-up training for speaker verification (NAW-SV). The NAW-SV guides the PM to extract consistent representations in noisy conditions using teacher-student learning. In this approach, to prevent the speaker information distortion problem, we introduce a novel loss function called extended angular prototypical network loss, which assists in considering speaker information and exploring robust speaker embedding space. We validated our proposed framework on the noise-synthesized VoxCeleb1 test set, demonstrating promising robustness.

Original languageEnglish
Pages (from-to)2665-2669
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2024
Event25th Interspeech Conferece 2024 - Kos Island, Greece
Duration: 1 Sep 20245 Sep 2024

Keywords

  • noisy environments
  • self-supervised learning
  • speaker verification
  • teacher-student learning

Fingerprint

Dive into the research topics of 'Improving Noise Robustness in Self-supervised Pre-trained Model for Speaker Verification'. Together they form a unique fingerprint.

Cite this