Skip to main navigation Skip to search Skip to main content

Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering

Research output: Contribution to journalArticlepeer-review

66 Scopus citations

Abstract

Data distribution has a significant impact on clustering results. This study focuses on the effect of cluster size distribution on clustering, namely the uniform effect of k-means and fuzzy c-means (FCM) clustering. We first provide some related works of k-means and FCM clustering. Then, the structure decomposition analysis of the objective functions of k-means and FCM is presented. Afterward, extensive experiments on both synthetic two-dimensional and three-dimensional data sets and real-world data sets from the UCI machine learning repository are conducted. The results demonstrate that FCM has stronger uniform effect than k-means clustering. Also, it reveals that the fuzzifier value m = 2 in FCM, which has been widely adopted in many applications, is not a good choice, particularly for data sets with great variation in cluster sizes. Therefore, for data sets with significant uneven distributions in cluster sizes, a smaller fuzzifier value is preferred for FCM clustering, and k-means clustering is a better choice compared with FCM clustering.

Original languageEnglish
Pages (from-to)455-466
Number of pages12
JournalPattern Analysis and Applications
Volume23
Issue number1
DOIs
StatePublished - 1 Feb 2020
Externally publishedYes

Keywords

  • Clustering
  • Data distribution
  • Fuzzifier
  • Fuzzy c-means (FCM)
  • Uniform effect
  • k-means

Fingerprint

Dive into the research topics of 'Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering'. Together they form a unique fingerprint.

Cite this