A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit

Jike Chong, Ekaterina Gonina, Youngmin Yi, Kurt Keutzer

Research output: Contribution to journalConference articlepeer-review

35 Scopus citations

Abstract

Tremendous compute throughput is becoming available in personal desktop and laptop systems through the use of graphics processing units (GPUs). However, exploiting this resource requires re-architecting an application to fit a data parallel programming model. The complex graph traversal routines in the inference process for large vocabulary continuous speech recognition (LVCSR) have been considered by many as unsuitable for extensive parallelization. We explore and demonstrate a fully data parallel implementation of a speech inference engine on NVIDIA's GTX280 GPU. Our implementation consists of two phases - compute-intensive observation probability computation phase and communication-intensive graph traversal phase. We take advantage of dynamic elimination of redundant computation in the compute-intensive phase while maintaining close-to-peak execution efficiency. We also demonstrate the importance of exploring application-level trade-offs in the communication- intensive graph traversal phase to adapt the algorithm to data parallel execution on GPUs. On 3.1 hours of speech data set, we achieve more than 11x speedup compared to a highly optimized sequential implementation on Intel Core i7 without sacrificing accuracy.

Original languageEnglish
Pages (from-to)1183-1186
Number of pages4
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
StatePublished - 2009
Event10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom
Duration: 6 Sep 200910 Sep 2009

Keywords

  • Continuous speech recognition
  • Data parallel
  • Graphics processing unit

Fingerprint

Dive into the research topics of 'A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit'. Together they form a unique fingerprint.

Cite this