Interpretation and optimization of the $k$-means algorithm

Kristian Sabo; Rudolf Scitovski

Interpretation and optimization of the $k$ -means algorithm

Kristian Sabo; Rudolf Scitovski

Applications of Mathematics (2014)

Volume: 59, Issue: 4, page 391-406
ISSN: 0862-7940

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

The paper gives a new interpretation and a possible optimization of the well-known

k

-means algorithm for searching for a locally optimal partition of the set

𝒜 = {a_{i} \in ℝ^{n} : i = 1, \dots, m}

which consists of

k

disjoint nonempty subsets

π_{1}, \dots, π_{k}

,

1 \leq k \leq m

. For this purpose, a new divided

k

-means algorithm was constructed as a limit case of the known smoothed

k

-means algorithm. It is shown that the algorithm constructed in this way coincides with the

k

-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided

k

-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.

How to cite

top

MLA
BibTeX
RIS

Sabo, Kristian, and Scitovski, Rudolf. "Interpretation and optimization of the $k$-means algorithm." Applications of Mathematics 59.4 (2014): 391-406. <http://eudml.org/doc/261931>.

@article{Sabo2014,
abstract = {The paper gives a new interpretation and a possible optimization of the well-known $k$-means algorithm for searching for a locally optimal partition of the set $\mathcal \{A\}= \lbrace a_i\in \mathbb \{R\}^n\colon i=1,\dots ,m\rbrace $ which consists of $k$ disjoint nonempty subsets $\pi _1,\dots ,\pi _k$, $1\le k\le m$. For this purpose, a new divided $k$-means algorithm was constructed as a limit case of the known smoothed $k$-means algorithm. It is shown that the algorithm constructed in this way coincides with the $k$-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided $k$-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.},
author = {Sabo, Kristian, Scitovski, Rudolf},
journal = {Applications of Mathematics},
keywords = {clustering; data mining; $k$-means; Voronoi diagram; clustering; -means; Voronoi diagram},
language = {eng},
number = {4},
pages = {391-406},
publisher = {Institute of Mathematics, Academy of Sciences of the Czech Republic},
title = {Interpretation and optimization of the $k$-means algorithm},
url = {http://eudml.org/doc/261931},
volume = {59},
year = {2014},
}

TY - JOUR
AU - Sabo, Kristian
AU - Scitovski, Rudolf
TI - Interpretation and optimization of the $k$-means algorithm
JO - Applications of Mathematics
PY - 2014
PB - Institute of Mathematics, Academy of Sciences of the Czech Republic
VL - 59
IS - 4
SP - 391
EP - 406
AB - The paper gives a new interpretation and a possible optimization of the well-known $k$-means algorithm for searching for a locally optimal partition of the set $\mathcal {A}= \lbrace a_i\in \mathbb {R}^n\colon i=1,\dots ,m\rbrace $ which consists of $k$ disjoint nonempty subsets $\pi _1,\dots ,\pi _k$, $1\le k\le m$. For this purpose, a new divided $k$-means algorithm was constructed as a limit case of the known smoothed $k$-means algorithm. It is shown that the algorithm constructed in this way coincides with the $k$-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided $k$-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.
LA - eng
KW - clustering; data mining; $k$-means; Voronoi diagram; clustering; -means; Voronoi diagram
UR - http://eudml.org/doc/261931
ER -

References

top

Aurenhammer, F., Klein, R., Voronoi Diagrams, Handbook of Computational Geometry J.-R. Sack et al. North-Holland Amsterdam (2000), 201-290. (2000) Zbl0995.65024 MR1746678
Bagirov, A. M., 10.1016/j.patcog.2008.04.004, Pattern Recognition 41 (2008), 3192-3199. (2008) Zbl1147.68669 DOI10.1016/j.patcog.2008.04.004
Bagirov, A. M., Ugon, J., Webb, D., 10.1016/j.patcog.2010.10.018, Pattern Recognition 44 (2011), 866-876. (2011) Zbl1213.68514 DOI10.1016/j.patcog.2010.10.018
Dennis, J. E. jun, Schnabel, R. B., Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Classics in Applied Mathematics 16 SIAM, Philadelphia (1996). (1996) MR1376139
Finkel, D. E., DIRECT Optimization Algorithm User Guide, Center for Research in Scientific Computation, North Carolina State University (2003), http://www4.ncsu.edu/ definkel/research/index.html.
Floudas, C. A., Gounaris, C. E., 10.1007/s10898-008-9332-8, J. Glob. Optim. 45 (2009), 3-38. (2009) Zbl1180.90245 MR2533740 DOI10.1007/s10898-008-9332-8
Gan, G., Ma, C., Wu, J., Data Clustering: Theory, Algorithms, and Applications, ASA-SIAM Series on Statistics and Applied Probability 20 SIAM, Philadelphia (2007). (2007) Zbl1185.68274 MR2331172
Grbić, R., Nyarko, E. K., Scitovski, R., 10.1007/s10898-012-0020-3, J. Glob. Optim. 57 (2013), 1193-1212. (2013) Zbl1279.65076 MR3121789 DOI10.1007/s10898-012-0020-3
Iyigun, C., Probabilistic Distance Clustering, Ph.D. thesis, Graduate School-New Brunswick Rutgers (2007). (2007) MR2429670
Iyigun, C., Ben-Israel, A., 10.1016/j.orl.2009.11.005, Oper. Res. Lett. 38 (2010), 207-214. (2010) Zbl1188.90148 MR2608859 DOI10.1016/j.orl.2009.11.005
Jones, D. R., Perttunen, C. D., Stuckman, B. E., 10.1007/BF00941892, J. Optimization Theory Appl. 79 (1993), 157-181. (1993) Zbl0796.49032 MR1246501 DOI10.1007/BF00941892
Kogan, J., Introduction to Clustering Large and High-Dimensional Data, Cambridge University Press Cambridge (2007). (2007) Zbl1183.62106 MR2297007
Kogan, J., Nicholas, C., Wiacek, M., Hybrid clustering of large high dimensional data, Proceedings of the Workshop on Text Mining M. Castellanos et al. SIAM (2007). (2007)
Kogan, J., Teboulle, M., Scaling clustering algorithms with Bregman distances, Proceedings of the Workshop on Text Mining (M. W. Berry et al., eds.), 2006.
Leisch, F., 10.1016/j.csda.2005.10.006, Comput. Stat. Data Anal. 51 (2006), 526-544. (2006) Zbl1157.62439 MR2297469 DOI10.1016/j.csda.2005.10.006
Likas, A., Vlassis, N., Verbeek, J. J., 10.1016/S0031-3203(02)00060-2, Pattern Recognition 36 (2003), 451-461. (2003) DOI10.1016/S0031-3203(02)00060-2
Ng, M. K., A note on constrained $k$ -means algorithms, Pattern Recognition 33 (2000), 525-519. (2000)
Pintér, J. D., 10.1007/978-1-4757-2502-5_13, Nonconvex Optimization and Its Applications 6 Kluwer Academic Publishers, Dordrecht (1996). (1996) Zbl0842.90110 MR1374104 DOI10.1007/978-1-4757-2502-5_13
Sabo, K., Scitovski, R., Vazler, I., 10.1007/s11590-011-0389-9, Optim. Lett. 7 (2013), 5-22. (2013) Zbl1283.90034 MR3017090 DOI10.1007/s11590-011-0389-9
Sabo, K., Scitovski, R., Vazler, I., Zekić-Sušac, M., 10.1016/j.enconman.2010.10.037, Energy Conversion and Management 52 (2011), 1721-1727. (2011) DOI10.1016/j.enconman.2010.10.037
Späth, H., Cluster-Formation und Analyse. Theorie, FORTRAN-Programme und Beispiele, R. Oldenburg Verlag München (1983). (1983) Zbl0536.62048
Su, Z., Kogan, J., Second order conditions for $k$ -means clustering: Partitions vs. centroids, Text Mining 2008 Workshop (held in conjuction with the 8th SIAM International Conference on Data Mining) Atlanta (2008). (2008)
Teboulle, M., A unified continuous optimization framework for center-based clustering methods, J. Mach. Learn. Res. 8 (2007), 65-102. (2007) Zbl1222.68318 MR2280215
Volkovich, V., Kogan, J., Nicholas, C., 10.1016/j.ejor.2005.12.045, Eur. J. Oper. Res. 183 (2007), 1097-1105. (2007) Zbl1135.90042 MR2343742 DOI10.1016/j.ejor.2005.12.045
Wang, W., Zhang, Y., On fuzzy cluster validity indices, Fuzzy Sets Syst. 158 (2007), 2095-2117. (2007) Zbl1123.62046 MR2405299
Yang, X. S., 10.1007/978-3-642-04944-6_14, O. Watanabe et al. Stochastic Algorithms: Foundations and Applications 5th international symposium, SAGA 2009, Sapporo, Japan. Proceedings. Springer, Berlin. Lecture Notes in Computer Science (2009), 169-178. Zbl1260.90164 MR2580252 DOI10.1007/978-3-642-04944-6_14

Citations in EuDML Documents

top

Rudolf Scitovski, Kristian Sabo, The adaptation of the $k$ -means algorithm to solving the multiple ellipses detection problem by using an initial approximation obtained by the DIRECT global optimization algorithm

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.