Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning

Han joon Kim, Jiyun Kim, Jinseog Kim, Pureum Lim

Research output: Contribution to journalArticlepeer-review

38 Scopus citations


This paper suggests a novel way of dramatically improving the Naïve Bayes text classifier with our semantic tensor space model for document representation. In our work, we intend to achieve a perfect text classification with the semantic Naïve Bayes learning that incorporates the semantic concept features into term feature statistics; for this, the Naïve Bayes learning is semantically augmented under the tensor space model where the ‘concept’ space is regarded as an independent space equated with the ‘term’ and ‘document’ spaces, and it is produced with concept-level informative Wikipedia pages associated with a given document corpus. Through extensive experiments using three popular document corpora including Reuters-21578, 20Newsgroups, and OHSUMED corpora, we prove that the proposed method not only has superiority over the recent deep learning-based classification methods but also shows nearly perfect classification performance.

Original languageEnglish
Pages (from-to)128-134
Number of pages7
StatePublished - 13 Nov 2018


  • Naïve Bayes learning
  • Semantic features
  • Tensor space
  • Text classification
  • Wikipedia


Dive into the research topics of 'Towards perfect text classification with Wikipedia-based semantic Naïve Bayes learning'. Together they form a unique fingerprint.

Cite this