Segment Aggregation for short utterances speaker verification using raw waveforms

Seung Bin Kim, Jee Weon Jung, Hye Jin Shim, Ju Ho Kim, Ha Jin Yu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Most studies on speaker verification systems focus on long-duration utterances, which are composed of sufficient phonetic information. However, the performances of these systems are known to degrade when short-duration utterances are inputted due to the lack of phonetic information as compared to the long utterances. In this paper, we propose a method that compensates for the performance degradation of speaker verification for short utterances, referred to as “segment aggregation”. The proposed method adopts an ensemble-based design to improve the stability and accuracy of speaker verification systems. The proposed method segments an input utterance into several short utterances and then aggregates the segment embeddings extracted from the segmented inputs to compose a speaker embedding. Then, this method simultaneously trains the segment embeddings and the aggregated speaker embedding. In addition, we also modified the teacher-student learning method for the proposed method. Experimental results on different input duration using the VoxCeleb1 test set demonstrate that the proposed technique improves speaker verification performance by about 45.37% relatively compared to the baseline system with 1-second test utterance condition.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages1521-1525
Number of pages5
ISBN (Print)9781713820697
DOIs
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords

  • Segment aggregation
  • Short utterances
  • Speaker embedding
  • Speaker verification
  • Teacher-student learning

Fingerprint

Dive into the research topics of 'Segment Aggregation for short utterances speaker verification using raw waveforms'. Together they form a unique fingerprint.

Cite this