Abstract
We propose an expanded end-to-end DNN architecture for speaker verification based on b-vectors as well as d-vectors. We embedded the components of a speaker verification system such as modeling frame-level features, extracting utterance-level features, dimensionality reduction of utterancelevel features, and trial-level scoring in an expanded end-toend DNN architecture. The main contribution of this paper is that, instead of using DNNs as parts of the system trained independently, we train the whole system jointly with a finetune cost after pre-training each part. The experimental results show that the proposed system outperforms the baseline dvector system and i-vector PLDA system.
| Original language | English |
|---|---|
| Pages (from-to) | 1532-1536 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2017-August |
| DOIs | |
| State | Published - 2017 |
| Event | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden Duration: 20 Aug 2017 → 24 Aug 2017 |
Keywords
- End-to-end DNN
- I-vector PLDA
- Speaker verification