Abstract
Document clustering is inherently an unsupervised learning process that organizes document (or text) data into distinct groups without depending on pre-specified knowledge. However, real-world applications, such as building a topical hierarchy for a large document collection, need to perform clustering under various kinds of constraints. This paper presents a new type of supervised clustering to organize information in a way that reflects knowledge provided by a user. As a means by which external human knowledge can be incorporated into the clustering process, a quadratic form distance metric is employed that contains a weight matrix. Also, we propose a way of representing knowledge to guide the clustering process and a variant of the gradient descent search technique to find a user-specific weight matrix under the hierarchical clustering strategy.
Original language | English |
---|---|
Pages | 16-20 |
Number of pages | 5 |
DOIs | |
State | Published - 2002 |
Event | Applied Computing 2002: Proceeedings of the 2002 ACM Symposium on Applied Computing - Madrid, Spain Duration: 11 Mar 2002 → 14 Mar 2002 |
Conference
Conference | Applied Computing 2002: Proceeedings of the 2002 ACM Symposium on Applied Computing |
---|---|
Country/Territory | Spain |
City | Madrid |
Period | 11/03/02 → 14/03/02 |
Keywords
- Document clustering
- Hierarchical clustering
- Information organization
- Quadratic form distance
- User knowledge