Segment unit shuffling layer in deep neural networks for text-independent speaker verification

Jungwoo Heo, Hye Jin Shim, Ju Ho Kim, Ha Jin Yu

Research output: Contribution to journalArticlepeer-review

Abstract

Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance oftext-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.

Original languageEnglish
Pages (from-to)148-154
Number of pages7
JournalJournal of the Acoustical Society of Korea
Volume40
Issue number2
DOIs
StatePublished - 2021

Keywords

  • Deep neural network
  • Shuffling generalization
  • Speaker embedding
  • Text-independent speaker verification

Fingerprint

Dive into the research topics of 'Segment unit shuffling layer in deep neural networks for text-independent speaker verification'. Together they form a unique fingerprint.

Cite this