Abstract
One of the most appropriate algorithms for operational text classification systems is the Naive Bayes because it is not only easy to incrementally update its pre-learned classification model and feature space, but also surprisingly accurate in spite of its wrong independence assumption. This paper focuses on improving Naive Bayes classifier by accelerating the EM (Expectation Maximization) algorithm. The multinomial Naive Bayes text classifier is extended with an accelerated EM algorithm that is simple yet has a fast convergence speed and allows estimating a more accurate classification model through automatic selective sampling. Throughout experiments using the well-known Reuters-21578 news collection, we show that the traditional Naive Bayes classifier can be significantly improved by the extended Naive Bayes algorithm with EM and selective sampling algorithms.
Original language | English |
---|---|
Pages (from-to) | 2889-2899 |
Number of pages | 11 |
Journal | Information |
Volume | 14 |
Issue number | 8 |
State | Published - Aug 2011 |
Keywords
- Classification uncertainty
- EM algorithm
- Machine learning
- Naive Bayes
- Selective sampling
- Text classification