Multidimensional text warehousing for automated text classification

Jiyun Kim, Han Joon Kim

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

This article describes how, in the era of big data, a data warehouse is an integrated multidimensional database that provides the basis for the decision making required to establish crucial business strategies. Efficient, effective analysis requires a data organization system that integrates and manages data of various dimensions. However, conventional data warehousing techniques do not consider the various data manipulation operations required for data-mining activities. With the current explosion of text data, much research has examined text (or document) repositories to support text mining and document retrieval. Therefore, this article presents a method of developing a text warehouse that provides a machine-learning-based text classification service. The document is represented as a term-by-concept matrix using a 3rd-order tensor-based textual representation model, which emphasizes the meaning of words occurring in the document. As a result, the proposed text warehouse makes it possible to develop a semantic Naïve Bayes text classifier only by executing appropriate SQL statements.

Original languageEnglish
Pages (from-to)168-183
Number of pages16
JournalJournal of Information Technology Research
Volume11
Issue number2
DOIs
StatePublished - 1 Apr 2018

Keywords

  • Data warehouse
  • Han-joon Kim
  • Jiyun Kim
  • Naïve Bayes
  • SQL
  • Text mining
  • Text warehouse
  • University of Seoul

Fingerprint

Dive into the research topics of 'Multidimensional text warehousing for automated text classification'. Together they form a unique fingerprint.

Cite this