Joint training of expanded end-to-end DNN for text-dependent speaker verification

Hee Soo Heo, Jee Weon Jung, IL Ho Yang, Sung Hyun Yoon, Ha Jin Yu

Research output: Contribution to journalConference articlepeer-review

15 Scopus citations

Abstract

We propose an expanded end-to-end DNN architecture for speaker verification based on b-vectors as well as d-vectors. We embedded the components of a speaker verification system such as modeling frame-level features, extracting utterance-level features, dimensionality reduction of utterancelevel features, and trial-level scoring in an expanded end-toend DNN architecture. The main contribution of this paper is that, instead of using DNNs as parts of the system trained independently, we train the whole system jointly with a finetune cost after pre-training each part. The experimental results show that the proposed system outperforms the baseline dvector system and i-vector PLDA system.

Original languageEnglish
Pages (from-to)1532-1536
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: 20 Aug 201724 Aug 2017

Keywords

  • End-to-end DNN
  • I-vector PLDA
  • Speaker verification

Fingerprint

Dive into the research topics of 'Joint training of expanded end-to-end DNN for text-dependent speaker verification'. Together they form a unique fingerprint.

Cite this