Abstract
This paper suggests a novel way of dramatically improving the Naïve Bayes text classifier with our semantic tensor space model for document representation. In our work, we intend to achieve a perfect text classification with the semantic Naïve Bayes learning that incorporates the semantic concept features into term feature statistics; for this, the Naïve Bayes learning is semantically augmented under the tensor space model where the ‘concept’ space is regarded as an independent space equated with the ‘term’ and ‘document’ spaces, and it is produced with concept-level informative Wikipedia pages associated with a given document corpus. Through extensive experiments using three popular document corpora including Reuters-21578, 20Newsgroups, and OHSUMED corpora, we prove that the proposed method not only has superiority over the recent deep learning-based classification methods but also shows nearly perfect classification performance.
Original language | English |
---|---|
Pages (from-to) | 128-134 |
Number of pages | 7 |
Journal | Neurocomputing |
Volume | 315 |
DOIs | |
State | Published - 13 Nov 2018 |
Keywords
- Naïve Bayes learning
- Semantic features
- Tensor space
- Text classification
- Wikipedia