Tensor Space Model-based Textual Data Augmentation for Text Classification

Minsuk Chang, Han Joon Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we first introduce a new text representation method to convert a textual document into a tensor space model named TextCuboid, which can preserve various meanings of polysemy. Based upon the new model, we propose two novel data augmentation techniques (called Boolean augmentation and CuboidGAN) that can be directly applied to the TextCuboid model for text classification tasks. Boolean augmentation includes three simple keyword modifications: synonym replacement, synonym insertion, and random deletion. CuboidGAN is composed of two key components, style encoding, and residual regression, and it is trained in two phases to generate unambiguous and plausible concept vectors. Through intensive experiments using five commonly used datasets, we prove that our proposed methods perform better data augmentation than other conventional methods. We also show that each augmentation method component significantly contributes to text classification through ablation studies.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Big Data, BigData 2023
EditorsJingrui He, Themis Palpanas, Xiaohua Hu, Alfredo Cuzzocrea, Dejing Dou, Dominik Slezak, Wei Wang, Aleksandra Gruca, Jerry Chun-Wei Lin, Rakesh Agrawal
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4276-4283
Number of pages8
ISBN (Electronic)9798350324457
DOIs
StatePublished - 2023
Event2023 IEEE International Conference on Big Data, BigData 2023 - Sorrento, Italy
Duration: 15 Dec 202318 Dec 2023

Publication series

NameProceedings - 2023 IEEE International Conference on Big Data, BigData 2023

Conference

Conference2023 IEEE International Conference on Big Data, BigData 2023
Country/TerritoryItaly
CitySorrento
Period15/12/2318/12/23

Keywords

  • Autoencoder
  • Data Augmentation
  • Deep Learning
  • Generative Adversarial Networks
  • Tensor Space Model
  • Text Classification
  • Text Representation Model

Fingerprint

Dive into the research topics of 'Tensor Space Model-based Textual Data Augmentation for Text Classification'. Together they form a unique fingerprint.

Cite this