Interpretation and optimization of the k -means algorithm

Kristian Sabo; Rudolf Scitovski

Applications of Mathematics (2014)

  • Volume: 59, Issue: 4, page 391-406
  • ISSN: 0862-7940

Abstract

top
The paper gives a new interpretation and a possible optimization of the well-known k -means algorithm for searching for a locally optimal partition of the set 𝒜 = { a i n : i = 1 , , m } which consists of k disjoint nonempty subsets π 1 , , π k , 1 k m . For this purpose, a new divided k -means algorithm was constructed as a limit case of the known smoothed k -means algorithm. It is shown that the algorithm constructed in this way coincides with the k -means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided k -means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.

How to cite

top

Sabo, Kristian, and Scitovski, Rudolf. "Interpretation and optimization of the $k$-means algorithm." Applications of Mathematics 59.4 (2014): 391-406. <http://eudml.org/doc/261931>.

@article{Sabo2014,
abstract = {The paper gives a new interpretation and a possible optimization of the well-known $k$-means algorithm for searching for a locally optimal partition of the set $\mathcal \{A\}= \lbrace a_i\in \mathbb \{R\}^n\colon i=1,\dots ,m\rbrace $ which consists of $k$ disjoint nonempty subsets $\pi _1,\dots ,\pi _k$, $1\le k\le m$. For this purpose, a new divided $k$-means algorithm was constructed as a limit case of the known smoothed $k$-means algorithm. It is shown that the algorithm constructed in this way coincides with the $k$-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided $k$-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.},
author = {Sabo, Kristian, Scitovski, Rudolf},
journal = {Applications of Mathematics},
keywords = {clustering; data mining; $k$-means; Voronoi diagram; clustering; -means; Voronoi diagram},
language = {eng},
number = {4},
pages = {391-406},
publisher = {Institute of Mathematics, Academy of Sciences of the Czech Republic},
title = {Interpretation and optimization of the $k$-means algorithm},
url = {http://eudml.org/doc/261931},
volume = {59},
year = {2014},
}

TY - JOUR
AU - Sabo, Kristian
AU - Scitovski, Rudolf
TI - Interpretation and optimization of the $k$-means algorithm
JO - Applications of Mathematics
PY - 2014
PB - Institute of Mathematics, Academy of Sciences of the Czech Republic
VL - 59
IS - 4
SP - 391
EP - 406
AB - The paper gives a new interpretation and a possible optimization of the well-known $k$-means algorithm for searching for a locally optimal partition of the set $\mathcal {A}= \lbrace a_i\in \mathbb {R}^n\colon i=1,\dots ,m\rbrace $ which consists of $k$ disjoint nonempty subsets $\pi _1,\dots ,\pi _k$, $1\le k\le m$. For this purpose, a new divided $k$-means algorithm was constructed as a limit case of the known smoothed $k$-means algorithm. It is shown that the algorithm constructed in this way coincides with the $k$-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided $k$-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.
LA - eng
KW - clustering; data mining; $k$-means; Voronoi diagram; clustering; -means; Voronoi diagram
UR - http://eudml.org/doc/261931
ER -

References

top
  1. Aurenhammer, F., Klein, R., Voronoi Diagrams, Handbook of Computational Geometry J.-R. Sack et al. North-Holland Amsterdam (2000), 201-290. (2000) Zbl0995.65024MR1746678
  2. Bagirov, A. M., 10.1016/j.patcog.2008.04.004, Pattern Recognition 41 (2008), 3192-3199. (2008) Zbl1147.68669DOI10.1016/j.patcog.2008.04.004
  3. Bagirov, A. M., Ugon, J., Webb, D., 10.1016/j.patcog.2010.10.018, Pattern Recognition 44 (2011), 866-876. (2011) Zbl1213.68514DOI10.1016/j.patcog.2010.10.018
  4. Dennis, J. E. jun, Schnabel, R. B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Classics in Applied Mathematics 16 SIAM, Philadelphia (1996). (1996) MR1376139
  5. Finkel, D. E., DIRECT Optimization Algorithm User Guide, Center for Research in Scientific Computation, North Carolina State University (2003), http://www4.ncsu.edu/ definkel/research/index.html. 
  6. Floudas, C. A., Gounaris, C. E., 10.1007/s10898-008-9332-8, J. Glob. Optim. 45 (2009), 3-38. (2009) Zbl1180.90245MR2533740DOI10.1007/s10898-008-9332-8
  7. Gan, G., Ma, C., Wu, J., Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on Statistics and Applied Probability 20 SIAM, Philadelphia (2007). (2007) Zbl1185.68274MR2331172
  8. Grbić, R., Nyarko, E. K., Scitovski, R., 10.1007/s10898-012-0020-3, J. Glob. Optim. 57 (2013), 1193-1212. (2013) Zbl1279.65076MR3121789DOI10.1007/s10898-012-0020-3
  9. Iyigun, C., Probabilistic Distance Clustering, Ph.D. thesis, Graduate School-New Brunswick Rutgers (2007). (2007) MR2429670
  10. Iyigun, C., Ben-Israel, A., 10.1016/j.orl.2009.11.005, Oper. Res. Lett. 38 (2010), 207-214. (2010) Zbl1188.90148MR2608859DOI10.1016/j.orl.2009.11.005
  11. Jones, D. R., Perttunen, C. D., Stuckman, B. E., 10.1007/BF00941892, J. Optimization Theory Appl. 79 (1993), 157-181. (1993) Zbl0796.49032MR1246501DOI10.1007/BF00941892
  12. Kogan, J., Introduction to Clustering Large and High-Dimensional Data, Cambridge University Press Cambridge (2007). (2007) Zbl1183.62106MR2297007
  13. Kogan, J., Nicholas, C., Wiacek, M., Hybrid clustering of large high dimensional data, Proceedings of the Workshop on Text Mining M. Castellanos et al. SIAM (2007). (2007) 
  14. Kogan, J., Teboulle, M., Scaling clustering algorithms with Bregman distances, Proceedings of the Workshop on Text Mining (M. W. Berry et al., eds.), 2006. 
  15. Leisch, F., 10.1016/j.csda.2005.10.006, Comput. Stat. Data Anal. 51 (2006), 526-544. (2006) Zbl1157.62439MR2297469DOI10.1016/j.csda.2005.10.006
  16. Likas, A., Vlassis, N., Verbeek, J. J., 10.1016/S0031-3203(02)00060-2, Pattern Recognition 36 (2003), 451-461. (2003) DOI10.1016/S0031-3203(02)00060-2
  17. Ng, M. K., A note on constrained k -means algorithms, Pattern Recognition 33 (2000), 525-519. (2000) 
  18. Pintér, J. D., 10.1007/978-1-4757-2502-5_13, Nonconvex Optimization and Its Applications 6 Kluwer Academic Publishers, Dordrecht (1996). (1996) Zbl0842.90110MR1374104DOI10.1007/978-1-4757-2502-5_13
  19. Sabo, K., Scitovski, R., Vazler, I., 10.1007/s11590-011-0389-9, Optim. Lett. 7 (2013), 5-22. (2013) Zbl1283.90034MR3017090DOI10.1007/s11590-011-0389-9
  20. Sabo, K., Scitovski, R., Vazler, I., Zekić-Sušac, M., 10.1016/j.enconman.2010.10.037, Energy Conversion and Management 52 (2011), 1721-1727. (2011) DOI10.1016/j.enconman.2010.10.037
  21. Späth, H., Cluster-Formation und Analyse. Theorie, FORTRAN-Programme und Beispiele, R. Oldenburg Verlag München (1983). (1983) Zbl0536.62048
  22. Su, Z., Kogan, J., Second order conditions for k -means clustering: Partitions vs. centroids, Text Mining 2008 Workshop (held in conjuction with the 8th SIAM International Conference on Data Mining) Atlanta (2008). (2008) 
  23. Teboulle, M., A unified continuous optimization framework for center-based clustering methods, J. Mach. Learn. Res. 8 (2007), 65-102. (2007) Zbl1222.68318MR2280215
  24. Volkovich, V., Kogan, J., Nicholas, C., 10.1016/j.ejor.2005.12.045, Eur. J. Oper. Res. 183 (2007), 1097-1105. (2007) Zbl1135.90042MR2343742DOI10.1016/j.ejor.2005.12.045
  25. Wang, W., Zhang, Y., On fuzzy cluster validity indices, Fuzzy Sets Syst. 158 (2007), 2095-2117. (2007) Zbl1123.62046MR2405299
  26. Yang, X. S., 10.1007/978-3-642-04944-6_14, O. Watanabe et al. Stochastic Algorithms: Foundations and Applications 5th international symposium, SAGA 2009, Sapporo, Japan. Proceedings. Springer, Berlin. Lecture Notes in Computer Science (2009), 169-178. Zbl1260.90164MR2580252DOI10.1007/978-3-642-04944-6_14

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.