Text Mining Methods for Hierarchical Document Indexing

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Scopus citations

Abstract

We have recently seen a tremendous growth in the volume of online text documents from networked resources such as the Internet, digital libraries, and company-wide intranets. One of the most common and successful methods of organizing such huge amounts of documents is to hierarchically categorize documents according to topic (Agrawal, Bayardo, & Srikant, 2000; Kim & Lee, 2003). The documents indexed according to a hierarchical structure (termed ‘topic hierarchy’ or ‘taxonomy’) are kept in internal categories as well as in leaf categories, in the sense that documents at a lower category have increasing specificity. Through the use of a topic hierarchy, users can quickly navigate to any portion of a document collection without being overwhelmed by a large document space. As is evident from the popularity of Web directories such as Yahoo (http://www.yahoo.com/) and Open Directory Project (http://dmoz.org/), topic hierarchies have increased in importance as a tool for organizing or browsing a large volume of electronic text documents.

Original languageEnglish
Title of host publicationEncyclopedia of Data Warehousing and Mining
Subtitle of host publication[2 volumes]
PublisherIGI Global
Pages1113-1119
Number of pages7
Volume1-2
ISBN (Electronic)9781591405597
DOIs
StatePublished - 1 Jan 2005

Fingerprint

Dive into the research topics of 'Text Mining Methods for Hierarchical Document Indexing'. Together they form a unique fingerprint.

Cite this