Abstract
Document clustering is inherently an unsupervised learning process that organizes document (or text) data into distinct groups without depending on pre-specified knowledge. However, real-world applications, such as building a topical hierarchy for a large document collection, need to perform clustering under various kinds of constraints. This paper presents a new type of supervised clustering to organize information in a way that reflects knowledge provided by a user. As a means by which external human knowledge can be incorporated into the clustering process, a quadratic form distance metric is employed that contains a weight matrix. Also, we propose a way of representing knowledge to guide the clustering process and a variant of the gradient descent search technique to find a user-specific weight matrix under the hierarchical clustering strategy.
| Original language | English |
|---|---|
| Pages | 16-20 |
| Number of pages | 5 |
| DOIs | |
| State | Published - 2002 |
| Event | Applied Computing 2002: Proceeedings of the 2002 ACM Symposium on Applied Computing - Madrid, Spain Duration: 11 Mar 2002 → 14 Mar 2002 |
Conference
| Conference | Applied Computing 2002: Proceeedings of the 2002 ACM Symposium on Applied Computing |
|---|---|
| Country/Territory | Spain |
| City | Madrid |
| Period | 11/03/02 → 14/03/02 |
Keywords
- Document clustering
- Hierarchical clustering
- Information organization
- Quadratic form distance
- User knowledge