Abstract
This article describes how, in the era of big data, a data warehouse is an integrated multidimensional database that provides the basis for the decision making required to establish crucial business strategies. Efficient, effective analysis requires a data organization system that integrates and manages data of various dimensions. However, conventional data warehousing techniques do not consider the various data manipulation operations required for data-mining activities. With the current explosion of text data, much research has examined text (or document) repositories to support text mining and document retrieval. Therefore, this article presents a method of developing a text warehouse that provides a machine-learning-based text classification service. The document is represented as a term-by-concept matrix using a 3rd-order tensor-based textual representation model, which emphasizes the meaning of words occurring in the document. As a result, the proposed text warehouse makes it possible to develop a semantic Naïve Bayes text classifier only by executing appropriate SQL statements.
Original language | English |
---|---|
Pages (from-to) | 168-183 |
Number of pages | 16 |
Journal | Journal of Information Technology Research |
Volume | 11 |
Issue number | 2 |
DOIs | |
State | Published - 1 Apr 2018 |
Keywords
- Data warehouse
- Han-joon Kim
- Jiyun Kim
- Naïve Bayes
- SQL
- Text mining
- Text warehouse
- University of Seoul