SEED: Speaker Embedding Enhancement Diffusion Model

  • Kihyun Nam
  • , Jungwoo Heo
  • , Jee Weon Jung
  • , Gangin Park
  • , Chaeyoung Jung
  • , Ha Jin Yu
  • , Joon Son Chung

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a pre-trained speaker recognition model and generates refined embeddings. For training, our approach progressively adds Gaussian noise to both clean and noisy speaker embeddings extracted from clean and noisy speech, respectively, via forward process of a diffusion model, and then reconstructs them to clean embeddings in the reverse process. While inferencing, all embeddings are regenerated via diffusion process. Our method needs neither speaker label nor any modification to the existing speaker recognition pipeline. Experiments on evaluation sets simulating environment mismatch scenarios show that our method can improve recognition accuracy by up to 19.6% over baseline models while retaining performance on conventional scenarios. We publish our code here.

Original languageEnglish
Pages (from-to)3718-3722
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
DOIs
StatePublished - 2025
Event26th Interspeech Conference 2025 - Rotterdam, Netherlands
Duration: 17 Aug 202521 Aug 2025

Keywords

  • diffusion probabilistic model
  • real-world environment
  • representation enhancement
  • speaker recognition

Fingerprint

Dive into the research topics of 'SEED: Speaker Embedding Enhancement Diffusion Model'. Together they form a unique fingerprint.

Cite this