TY - JOUR
T1 - Sparse kernel k-means clustering
AU - Park, Beomjin
AU - Park, Changyi
AU - Hong, Sungchul
AU - Choi, Hosik
N1 - Publisher Copyright:
© 2024 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2024
Y1 - 2024
N2 - Clustering is an essential technique that groups similar data points to uncover the underlying structure and features of the data. Although traditional clustering methods such as k-means are widely utilized, they have limitations in identifying nonlinear clusters. Thus, alternative techniques, such as kernel k-means and spectral clustering, have been developed to address this issue. However, another challenge arises when irrelevant variables are present in the data; this can be mitigated by employing variable selection methods such as the filter, wrapper, and embedded approaches. In this study, with a particular focus on kernel k-means clustering, we propose an embedded variable selection method using a tensor product space along with a general analysis of variance kernel for nonlinear clustering. Comprehensive experiments involving simulations and real data analysis demonstrated that the proposed method achieves competitive performance compared to existing approaches. Thus, the proposed method may serve as a reliable tool for accurate cluster identification and variable selection to gain insights into complex datasets.
AB - Clustering is an essential technique that groups similar data points to uncover the underlying structure and features of the data. Although traditional clustering methods such as k-means are widely utilized, they have limitations in identifying nonlinear clusters. Thus, alternative techniques, such as kernel k-means and spectral clustering, have been developed to address this issue. However, another challenge arises when irrelevant variables are present in the data; this can be mitigated by employing variable selection methods such as the filter, wrapper, and embedded approaches. In this study, with a particular focus on kernel k-means clustering, we propose an embedded variable selection method using a tensor product space along with a general analysis of variance kernel for nonlinear clustering. Comprehensive experiments involving simulations and real data analysis demonstrated that the proposed method achieves competitive performance compared to existing approaches. Thus, the proposed method may serve as a reliable tool for accurate cluster identification and variable selection to gain insights into complex datasets.
KW - analysis of variance kernel
KW - Nonlinear clustering
KW - sparse learning
KW - variable selection
UR - http://www.scopus.com/inward/record.url?scp=85195204899&partnerID=8YFLogxK
U2 - 10.1080/02664763.2024.2362266
DO - 10.1080/02664763.2024.2362266
M3 - Article
AN - SCOPUS:85195204899
SN - 0266-4763
JO - Journal of Applied Statistics
JF - Journal of Applied Statistics
ER -