Abstract
We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a natural distribution for binary data and by formulating convex clustering using a pairwise penalty on prototypes of clusters. Under convex clustering, we show that the typical ℓ1 pairwise fused penalty results in ineffective cluster formation. In an attempt to promote the clustering performance and select the relevant clustering variables, we propose the penalized maximum likelihood estimation with an ℓ2 fused penalty on the fusion parameters and an ℓ1 penalty on the loading matrix. We provide an efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers. Numerical studies confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.
Original language | English |
---|---|
Pages (from-to) | 991-1018 |
Number of pages | 28 |
Journal | Advances in Data Analysis and Classification |
Volume | 13 |
Issue number | 4 |
DOIs | |
State | Published - 1 Dec 2019 |
Keywords
- Binary data
- Convex clustering
- Dimension reduction
- Fused penalty