An effective EM extension of Naive Bayes text classifier with uncertainty-based selective sampling

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

One of the most appropriate algorithms for operational text classification systems is the Naive Bayes because it is not only easy to incrementally update its pre-learned classification model and feature space, but also surprisingly accurate in spite of its wrong independence assumption. This paper focuses on improving Naive Bayes classifier by accelerating the EM (Expectation Maximization) algorithm. The multinomial Naive Bayes text classifier is extended with an accelerated EM algorithm that is simple yet has a fast convergence speed and allows estimating a more accurate classification model through automatic selective sampling. Throughout experiments using the well-known Reuters-21578 news collection, we show that the traditional Naive Bayes classifier can be significantly improved by the extended Naive Bayes algorithm with EM and selective sampling algorithms.

Original languageEnglish
Pages (from-to)2889-2899
Number of pages11
JournalInformation
Volume14
Issue number8
StatePublished - Aug 2011

Keywords

  • Classification uncertainty
  • EM algorithm
  • Machine learning
  • Naive Bayes
  • Selective sampling
  • Text classification

Fingerprint

Dive into the research topics of 'An effective EM extension of Naive Bayes text classifier with uncertainty-based selective sampling'. Together they form a unique fingerprint.

Cite this