TY - GEN
T1 - Large vocabulary Korean continuous speech recognition using a one-pass algorithm
AU - Yu, Ha Jin
AU - Kim, Hoon
AU - Hong, Joon Mo
AU - Kim, Min Seong
AU - Lee, Jong Seok
PY - 2000
Y1 - 2000
N2 - In this paper, we describe problems in recognizing largevocabulary Korean continuous speech, and proposed solutions to them. Korean sentences consist of eojeols, which are separated by spaces in text and consist of morphemes. When we use morpheme units, there are many word insertion and deletion errors because morpheme units are too short. We introduce a between-word phone variation lexicon that can represent many alternatives of phones of words in one structure. The decoding algorithm is composed of one pass, which is a modification of token-passing algorithm. In this algorithm, we allowed multiple tokens in a state at a time to get globalbest path without expanding the states when we use trigram language models. We confirmed thatbetween-word phone variation lexicon is useful for morpheme-based recognition by observing that the improvement is higher for morpheme units than for eojeol units. Allowing multiple tokens at a state also improved the performance.
AB - In this paper, we describe problems in recognizing largevocabulary Korean continuous speech, and proposed solutions to them. Korean sentences consist of eojeols, which are separated by spaces in text and consist of morphemes. When we use morpheme units, there are many word insertion and deletion errors because morpheme units are too short. We introduce a between-word phone variation lexicon that can represent many alternatives of phones of words in one structure. The decoding algorithm is composed of one pass, which is a modification of token-passing algorithm. In this algorithm, we allowed multiple tokens in a state at a time to get globalbest path without expanding the states when we use trigram language models. We confirmed thatbetween-word phone variation lexicon is useful for morpheme-based recognition by observing that the improvement is higher for morpheme units than for eojeol units. Allowing multiple tokens at a state also improved the performance.
UR - http://www.scopus.com/inward/record.url?scp=85009113814&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85009113814
T3 - 6th International Conference on Spoken Language Processing, ICSLP 2000
BT - 6th International Conference on Spoken Language Processing, ICSLP 2000
PB - International Speech Communication Association
T2 - 6th International Conference on Spoken Language Processing, ICSLP 2000
Y2 - 16 October 2000 through 20 October 2000
ER -