TY - JOUR
T1 - Classification of histogram-valued data with support histogram machines
AU - Kang, Ilsuk
AU - Park, Cheolwoo
AU - Yoon, Young Joo
AU - Park, Changyi
AU - Kwon, Soon Sun
AU - Choi, Hosik
N1 - Publisher Copyright:
© 2021 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2023
Y1 - 2023
N2 - The current large amounts of data and advanced technologies have produced new types of complex data, such as histogram-valued data. The paper focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median. However, this approach forgoes the distributional information available in histograms. To address this issue, we propose a margin-based classifier called support histogram machine (SHM) for histogram-valued data. We adopt the support vector machine framework and the Wasserstein-Kantorovich metric to measure distances between histograms. The proposed optimization problem is solved by a dual approach. We then test the proposed SHM via simulated and real examples and demonstrate its superior performance to summary-value-based methods.
AB - The current large amounts of data and advanced technologies have produced new types of complex data, such as histogram-valued data. The paper focuses on classification problems when predictors are observed as or aggregated into histograms. Because conventional classification methods take vectors as input, a natural approach converts histograms into vector-valued data using summary values, such as the mean or median. However, this approach forgoes the distributional information available in histograms. To address this issue, we propose a margin-based classifier called support histogram machine (SHM) for histogram-valued data. We adopt the support vector machine framework and the Wasserstein-Kantorovich metric to measure distances between histograms. The proposed optimization problem is solved by a dual approach. We then test the proposed SHM via simulated and real examples and demonstrate its superior performance to summary-value-based methods.
KW - Support vector machines
KW - Wasserstein-Kantorovich metric
KW - symbolic data
UR - http://www.scopus.com/inward/record.url?scp=85109263163&partnerID=8YFLogxK
U2 - 10.1080/02664763.2021.1947996
DO - 10.1080/02664763.2021.1947996
M3 - Article
AN - SCOPUS:85109263163
SN - 0266-4763
VL - 50
SP - 675
EP - 690
JO - Journal of Applied Statistics
JF - Journal of Applied Statistics
IS - 3
ER -