Enhanced document clustering using wikipedia-based document representation

Ki Joo Hong, Ga Hui Lee, Han Joon Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Most traditional clustering methods are based on the Vector Space Model (VSM) using ‘Bag of Words’ (BOW) representation. However, the BOW representation which only accounts for term frequency is quite limited because it ignores semantic relations among indexed terms. To resolve this problem, this paper proposes a new method of constructing the matrices of document representation by utilizing the Wikipedia encyclopedia, with not depending on traditional VSM, to significantly enhance the quality of document clustering. Through extensive experiments with popular 20 Newsgroup dataset, we show that our proposed method notably improves clustering performance compared with the traditional VSM-based clustering method.

Original languageEnglish
Title of host publicationApplied System Innovation - Proceedings of the International Conference on Applied System Innovation, ICASI 2015
EditorsTeen-Hang Meen, Stephen D. Prior, Artde Donald Kin-Tak Lam
PublisherCRC Press/Balkema
Pages183-186
Number of pages4
ISBN (Print)9781138028937
DOIs
StatePublished - 2016
EventInternational Conference on Applied System Innovation, ICASI 2015 - Osaka, Japan
Duration: 22 May 201527 May 2015

Publication series

NameApplied System Innovation - Proceedings of the International Conference on Applied System Innovation, ICASI 2015

Conference

ConferenceInternational Conference on Applied System Innovation, ICASI 2015
Country/TerritoryJapan
CityOsaka
Period22/05/1527/05/15

Fingerprint

Dive into the research topics of 'Enhanced document clustering using wikipedia-based document representation'. Together they form a unique fingerprint.

Cite this