Abstract
Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance oftext-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.
Original language | English |
---|---|
Pages (from-to) | 148-154 |
Number of pages | 7 |
Journal | Journal of the Acoustical Society of Korea |
Volume | 40 |
Issue number | 2 |
DOIs | |
State | Published - 2021 |
Keywords
- Deep neural network
- Shuffling generalization
- Speaker embedding
- Text-independent speaker verification