Classification of histogram-valued data with support histogram machines

Ilsuk Kang, Cheolwoo Park, Young Joo Yoon, Changyi Park, Soon Sun Kwon, Hosik Choi

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The current large amounts of data and advanced technologies have produced new types of complex data, such as histogram-valued data. The paper focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median. However, this approach forgoes the distributional information available in histograms. To address this issue, we propose a margin-based classifier called support histogram machine (SHM) for histogram-valued data. We adopt the support vector machine framework and the Wasserstein-Kantorovich metric to measure distances between histograms. The proposed optimization problem is solved by a dual approach. We then test the proposed SHM via simulated and real examples and demonstrate its superior performance to summary-value-based methods.

Original languageEnglish
Pages (from-to)675-690
Number of pages16
JournalJournal of Applied Statistics
Volume50
Issue number3
DOIs
StatePublished - 2023

Keywords

  • Support vector machines
  • Wasserstein-Kantorovich metric
  • symbolic data

Fingerprint

Dive into the research topics of 'Classification of histogram-valued data with support histogram machines'. Together they form a unique fingerprint.

Cite this