Convex clustering for binary data

Hosik Choi, Seokho Lee

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

We present a new clustering algorithm for multivariate binary data. The new algorithm is based on the convex relaxation of hierarchical clustering, which is achieved by considering the binomial likelihood as a natural distribution for binary data and by formulating convex clustering using a pairwise penalty on prototypes of clusters. Under convex clustering, we show that the typical ℓ1 pairwise fused penalty results in ineffective cluster formation. In an attempt to promote the clustering performance and select the relevant clustering variables, we propose the penalized maximum likelihood estimation with an ℓ2 fused penalty on the fusion parameters and an ℓ1 penalty on the loading matrix. We provide an efficient algorithm to solve the optimization by using majorization-minimization algorithm and alternative direction method of multipliers. Numerical studies confirmed its good performance and real data analysis demonstrates the practical usefulness of the proposed method.

Original languageEnglish
Pages (from-to)991-1018
Number of pages28
JournalAdvances in Data Analysis and Classification
Volume13
Issue number4
DOIs
StatePublished - 1 Dec 2019

Keywords

  • Binary data
  • Convex clustering
  • Dimension reduction
  • Fused penalty

Fingerprint

Dive into the research topics of 'Convex clustering for binary data'. Together they form a unique fingerprint.

Cite this