How the initialization affects the stability of the қ-means algorithm

Sébastien Bubeck; Marina Meilă; Ulrike von Luxburg

ESAIM: Probability and Statistics (2012)

  • Volume: 16, page 436-452
  • ISSN: 1292-8100

Abstract

top
We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.

How to cite

top

Bubeck, Sébastien, Meilă, Marina, and von Luxburg, Ulrike. "How the initialization affects the stability of the қ-means algorithm." ESAIM: Probability and Statistics 16 (2012): 436-452. <http://eudml.org/doc/273610>.

@article{Bubeck2012,
abstract = {We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.},
author = {Bubeck, Sébastien, Meilă, Marina, von Luxburg, Ulrike},
journal = {ESAIM: Probability and Statistics},
keywords = {clustering; қ-means; stability; model selection; -means},
language = {eng},
pages = {436-452},
publisher = {EDP-Sciences},
title = {How the initialization affects the stability of the қ-means algorithm},
url = {http://eudml.org/doc/273610},
volume = {16},
year = {2012},
}

TY - JOUR
AU - Bubeck, Sébastien
AU - Meilă, Marina
AU - von Luxburg, Ulrike
TI - How the initialization affects the stability of the қ-means algorithm
JO - ESAIM: Probability and Statistics
PY - 2012
PB - EDP-Sciences
VL - 16
SP - 436
EP - 452
AB - We investigate the role of the initialization for the stability of the қ-means clustering algorithm. As opposed to other papers, we consider the actual қ-means algorithm (also known as Lloyd algorithm). In particular we leverage on the property that this algorithm can get stuck in local optima of the қ-means objective function. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.
LA - eng
KW - clustering; қ-means; stability; model selection; -means
UR - http://eudml.org/doc/273610
ER -

References

top
  1. [1] D. Arthur and S. Vassilvitskii, қ-means++ : the advantages of careful seeding, in Proc. of SODA (2007). Zbl1302.68273
  2. [2] S. Ben-David and U. von Luxburg, Relating clustering stability to properties of cluster boundaries, in Proc. of COLT (2008). 
  3. [3] S. Ben-David, U. von Luxburg and D. Pál, A sober look on clustering stability, in Proc. of COLT (2006). Zbl1143.68520
  4. [4] S. Ben-David, D. Pál and H.-U. Simon, Stability of қ-means clustering, in Proc. of COLT (2007). Zbl1203.68138
  5. [5] L. Bottou and Y. Bengio, Convergence properties of the қ-means algorithm, in Proc. of NIPS (1995). 
  6. [6] S. Dasgupta and L. Schulman, A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. J. Mach. Learn. Res.8 (2007) 203–226. Zbl1222.62142MR2320668
  7. [7] S. Graf and H. Luschgy, Foundations of Quantization for Probability Distributions. Springer (2000). Zbl0951.60003MR1764176
  8. [8] D. Hochbaum and D. Shmoys, A best possible heuristic for the -center problem. Math. Operat. Res.10 (1985) 180–184. Zbl0565.90015MR793876
  9. [9] T. Lange, V. Roth, M. Braun and J. Buhmann, Stability-based validation of clustering solutions. Neural Comput.16 (2004) 1299–1323. Zbl1089.68100
  10. [10] R. Ostrovsky, Y. Rabani, L.J. Schulman and C. Swamy, The effectiveness of Lloyd-type methods for the қ-means problem, in Proc. of FOCS (2006). Zbl1281.68229
  11. [11] O. Shamir and N. Tishby, Cluster stability for finite samples, in Proc. of NIPS (2008). 
  12. [12] O. Shamir and N. Tishby, Model selection and stability in қ-means clustering, in Proc. of COLT (2008). 
  13. [13] O. Shamir and N. Tishby, On the reliability of clustering stability in the large sample regime, in Proc. of NIPS (2008). 
  14. [14] N. Srebro, G. Shakhnarovich and S. Roweis, An investigation of computational and informational limits in Gaussian mixture clustering, in Proc. of ICML (2006). 
  15. [15] Z. Zhang, B. Dai and A. Tung, Estimating local optimums in EM algorithm over Gaussian mixture model, in Proc. of ICML (2008). 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.