Abstract
One of the existing sparse clustering approaches, 1-k-means, maximizes the weighted between-cluster sum of squares subject to the 1 penalty. In this paper, we propose a sparse clustering method based on an 8/0 penalty, which we call 0-k-means. We design an efficient iterative algorithm for solving it. To compare the theoretical properties of 1 and 0-k-means, we show that they can be explained explicitly from a thresholding perspective based on different thresholding functions. Moreover, 1 and 0-k-means are proven to have a screening consistent property under Gaussian mixture models. Experiments on synthetic as well as real data justify the outperforming results of 0 with respect to 1-k-means.
| Original language | English |
|---|---|
| Pages (from-to) | 1265-1284 |
| Number of pages | 20 |
| Journal | Statistica Sinica |
| Volume | 28 |
| Issue number | 3 |
| DOIs | |
| State | Published - Jul 2018 |
Keywords
- High-dimensional data clustering
- Screening property
- Sparse k-means