Integrated receptive field diversification method for improving speaker verification performance for variable-length utterances

Hyun Seo Shin, Ju Ho Kim, Jungwoo Heo, Hye Jin Shim, Ha Jin Yu

Research output: Contribution to journalArticlepeer-review

Abstract

The variation of utterance lengths is a representative factor that can degrade the performance of speaker verification systems. To handle this issue, previous studies had attempted to extract speaker features from various branches or to use convolution layers with different receptive fields. Combining the advantages of the previous two approaches for variable-length input, this paper proposes integrated receptive field diversification that extracts speaker features through more diverse receptive field. The proposed method processes the input features by convolutional layers with different receptive fields at multiple time-axis branches, and extracts speaker embedding by dynamically aggregating the processed features according to the lengths of input utterances. The deep neural networks in this study were trained on the VoxCeleb2 dataset and tested on the VoxCeleb1 evaluation dataset that divided into 1 s, 2 s, 5 s, and full-length. Experimental results demonstrated that the proposed method reduces the equal error rate by 19.7 % compared to the baseline.

Original languageEnglish
Pages (from-to)319-325
Number of pages7
JournalJournal of the Acoustical Society of Korea
Volume41
Issue number3
DOIs
StatePublished - 2022

Keywords

  • Deep neural network
  • Receptive field
  • Speaker verification
  • Variable-length utterance

Fingerprint

Dive into the research topics of 'Integrated receptive field diversification method for improving speaker verification performance for variable-length utterances'. Together they form a unique fingerprint.

Cite this