DIFF-SV: A UNIFIED HIERARCHICAL FRAMEWORK FOR NOISE-ROBUST SPEAKER VERIFICATION USING SCORE-BASED DIFFUSION PROBABILISTIC MODELS

  • Ju Ho Kim
  • , Jungwoo Heo
  • , Hyun Seo Shin
  • , Chan Yeong Lim
  • , Ha Jin Yu

Research output: Contribution to journalConference articlepeer-review

7 Scopus citations

Abstract

Background noise considerably reduces the accuracy and reliability of speaker verification (SV) systems. These challenges can be addressed using a speech enhancement system as a front-end module. Recently, diffusion probabilistic models (DPMs) have exhibited remarkable noise-compensation capabilities in the speech enhancement domain. Building on this success, we propose Diff-SV, a noise-robust SV framework that leverages DPM. Diff-SV unifies a DPM-based speech enhancement system with a speaker embedding extractor, and yields a discriminative and noise-tolerable speaker representation through a hierarchical structure. The proposed model was evaluated under both in-domain and out-of-domain noisy conditions using the VoxCeleb1 test set, an external noise source, and the VOiCES corpus. The obtained experimental results demonstrate that Diff-SV achieves state-of-the-art performance, outperforming recently proposed noise-robust SV systems.

Original languageEnglish
Pages (from-to)10341-10345
Number of pages5
JournalProceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
DOIs
StatePublished - 2024
Event2024 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Keywords

  • diffusion probabilistic models
  • feature enhancement
  • noisy environment
  • speaker verification

Fingerprint

Dive into the research topics of 'DIFF-SV: A UNIFIED HIERARCHICAL FRAMEWORK FOR NOISE-ROBUST SPEAKER VERIFICATION USING SCORE-BASED DIFFUSION PROBABILISTIC MODELS'. Together they form a unique fingerprint.

Cite this