Abstract
This paper discusses a new type of semi-supervised document clustering that uses partial supervision to partition a large set of documents. Most clustering methods organizes documents into groups based only on similarity measures. In this paper, we attempt to isolate more semantically coherent clusters by employing the domain-specific knowledge provided by a document analyst. By using external human knowledge to guide the clustering mechanism with some flexibility when creating the clusters, clustering efficiency can be considerably enhanced. Experimental results show that the use of only a little external knowledge can considerably enhance the quality of clustering results that satisfy users' constraint.
Original language | English |
---|---|
Pages (from-to) | 1043-1048 |
Number of pages | 6 |
Journal | IEICE Transactions on Information and Systems |
Volume | E85-D |
Issue number | 6 |
State | Published - Jun 2002 |