Abstract
We introduce acoustic sub-word units to neural networks for speaker- independent continuous speech recognition. The functions of segmenting input and detecting words are implemented with networks of simple structures. The non-uniform unit which we introduce in this research can model phoneme variations caused by co-articulation spread over several phonemes and between words. These units can be segmented by the network according to stationary and transition parts of speech without iteration or without considering all possible position shifts. A word lexicon can be trained by the network, which can effectively memorize all transcription variations in the training utterances of words. The results of speaker-independent word spotting of 520 words with TIMIT data are described. (C) 2000 Elsevier Science Ltd.
Original language | English |
---|---|
Pages (from-to) | 681-688 |
Number of pages | 8 |
Journal | Neural Networks |
Volume | 13 |
Issue number | 6 |
DOIs | |
State | Published - Jul 2000 |
Keywords
- Continuous speech recognition
- Neural network, Speech recognition
- Non-uniform units
- Speech segmentation
- Sub- word units
- Word lexicon
- Word spotting