One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification

Jungwoo Heo, Chan Yeong Lim, Ju Ho Kim, Hyun Seo Shin, Ha Jin Yu

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

The application of speech self-supervised learning (SSL) models has achieved remarkable performance in speaker verification (SV). However, there is a computational cost hurdle in employing them, which makes development and deployment difficult. Several studies have simply compressed SSL models through knowledge distillation (KD) without considering the target task. Consequently, these methods could not extract SV-tailored features. This paper suggests One-Step Knowledge Distillation and Fine-Tuning (OS-KDFT), which incorporates KD and fine-tuning (FT). We optimize a student model for SV during KD training to avert the distillation of inappropriate information for the SV. OS-KDFT could downsize Wav2Vec 2.0 based ECAPA-TDNN size by approximately 76.2%, and reduce the SSL model's inference time by 79% while presenting an EER of 0.98%. The proposed OS-KDFT is validated across VoxCeleb1 and VoxCeleb2 datasets and W2V2 and HuBERT SSL models. Experiments are available on our GitHub.

Original languageEnglish
Pages (from-to)5271-5275
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • Knowledge-distillation
  • Self-supervised learning model
  • Speaker verification

Fingerprint

Dive into the research topics of 'One-Step Knowledge Distillation and Fine-Tuning in Using Large Pre-Trained Self-Supervised Learning Models for Speaker Verification'. Together they form a unique fingerprint.

Cite this