Abstract
Extracting topic keywords from on-line text documents is highly significant in text mining applications. In our work, extracted keywords are represented as a hierarchical topic tree. For this, we basically use incremental clustering technique for incoming online documents. Moreover, we define a cluster-based measure similar to the tfidf measure and a probabilistic inequality to determine subsumption relationships among keywords. In this paper, with Google news data, we empirically analyze our proposed method in terms of the threshold value of incremental clustering algorithm, the range of keyword extraction measure and the amount of text data and prove its superiority.
Original language | English |
---|---|
Article number | 102 |
Pages (from-to) | 706-710 |
Number of pages | 5 |
Journal | Life Science Journal |
Volume | 11 |
Issue number | 7 |
State | Published - 2014 |
Keywords
- Clustering
- Text mining
- Topic keywords
- Topic trees