An AutoEncoder-based Numerical Training Data Augmentation Technique

Jueun Jeong, Hanseok Jeong, Han Joon Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

This paper aims to automatically augment numerical tabular data by using the variational autoencoder model. For this, we try to solve the problem of class imbalance in numerical data and to improve the performance of the classification model by augmenting the training data. In this paper, we propose a new augmentation technique called 'D-VAE' which performs data augmentation through variational autoencoder with discretization for numerical columuns; D-VAE artificially increases the number of records and the number of columns for a given tabular data. The main features of the proposed technique are to kperform discretization and feature selection in the preprocessing process. For the discretization process, we use k-means algorithm, through which records within a given table are grouped, and then converted into one-hot vectors according to the clustering results. In addition, for memory efficiency, we reduced the number of parameters of the VAE model by using a relatively small number of features through feature selection called REFCV. To evaluate the performance of the proposed technique, we conducted various experiments by numerical data augmentation ratio using four open datasets.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5944-5951
Number of pages8
ISBN (Electronic)9781665480451
DOIs
StatePublished - 2022
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: 17 Dec 202220 Dec 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

Conference

Conference2022 IEEE International Conference on Big Data, Big Data 2022
Country/TerritoryJapan
CityOsaka
Period17/12/2220/12/22

Keywords

  • Autoencoder
  • Data Augmentation
  • Deep learning
  • Tabular data
  • VAE

Fingerprint

Dive into the research topics of 'An AutoEncoder-based Numerical Training Data Augmentation Technique'. Together they form a unique fingerprint.

Cite this