An effective document clustering method using user-adaptable distance metrics

Han Joon Kim, Sang Goo Lee

Research output: Contribution to conferencePaperpeer-review

19 Scopus citations

Abstract

Document clustering is inherently an unsupervised learning process that organizes document (or text) data into distinct groups without depending on pre-specified knowledge. However, real-world applications, such as building a topical hierarchy for a large document collection, need to perform clustering under various kinds of constraints. This paper presents a new type of supervised clustering to organize information in a way that reflects knowledge provided by a user. As a means by which external human knowledge can be incorporated into the clustering process, a quadratic form distance metric is employed that contains a weight matrix. Also, we propose a way of representing knowledge to guide the clustering process and a variant of the gradient descent search technique to find a user-specific weight matrix under the hierarchical clustering strategy.

Original languageEnglish
Pages16-20
Number of pages5
DOIs
StatePublished - 2002
EventApplied Computing 2002: Proceeedings of the 2002 ACM Symposium on Applied Computing - Madrid, Spain
Duration: 11 Mar 200214 Mar 2002

Conference

ConferenceApplied Computing 2002: Proceeedings of the 2002 ACM Symposium on Applied Computing
Country/TerritorySpain
CityMadrid
Period11/03/0214/03/02

Keywords

  • Document clustering
  • Hierarchical clustering
  • Information organization
  • Quadratic form distance
  • User knowledge

Fingerprint

Dive into the research topics of 'An effective document clustering method using user-adaptable distance metrics'. Together they form a unique fingerprint.

Cite this