Abstract
This paper proposes a new way of WordNet-based feature engineering method that can help to improve text classification systems. Basically, machine learning-based classification systems can be enhanced by augmenting their set of model features. To this end, we intend to identify some significant features from training data and to extract their synonyms or hyponyms with WordNet; in order to isolate more significant feature, we devise a special function that computes a similarity between a given word and each of classes. To evaluate the proposed method, we try to improve the Naive Bayes text classifier with Reuters-21578 collection as a test set. In our experiment, we show that the proposed method can contribute to improve the Naive Bayes classifier even without modifying its core algorithm.
Original language | English |
---|---|
Pages (from-to) | 8161-8168 |
Number of pages | 8 |
Journal | Information |
Volume | 16 |
Issue number | 11 |
State | Published - Nov 2013 |
Keywords
- Feature engineering
- Machine learning
- Naïve bayes
- Text classification
- WordNet