Semantically enriching text representation model for document clustering

Han Joon Kim, Kee Joo Hong, Jae Young Chang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Scopus citations

Abstract

This paper presents a novel text space model that represents textual documents for document clustering, which contains the 'concept' space independently of the 'document' and 'term' spaces. The text model described here represents documents as matrices (i.e., 2nd-order tensors), and a document corpus is represented as a 3rd-order tensor. For this, it is necessary to produce the concept vector for each term that occurs in a given document, which is related to word sense disambiguation. As an external knowledge source for concept weighting, we employ the Wikipedia encyclopedia.

Original languageEnglish
Title of host publication2015 Symposium on Applied Computing, SAC 2015
EditorsDongwan Shin
PublisherAssociation for Computing Machinery
Pages922-925
Number of pages4
ISBN (Electronic)9781450331968
DOIs
StatePublished - 13 Apr 2015
Event30th Annual ACM Symposium on Applied Computing, SAC 2015 - Salamanca, Spain
Duration: 13 Apr 201517 Apr 2015

Publication series

NameProceedings of the ACM Symposium on Applied Computing
Volume13-17-April-2015

Conference

Conference30th Annual ACM Symposium on Applied Computing, SAC 2015
Country/TerritorySpain
CitySalamanca
Period13/04/1517/04/15

Keywords

  • Concepts
  • Document clustering
  • Tensor space model
  • Text mining
  • Vector space model
  • Wikipedia

Fingerprint

Dive into the research topics of 'Semantically enriching text representation model for document clustering'. Together they form a unique fingerprint.

Cite this