Center-based l₁-clustering method

Kristian Sabo

Center-based l₁-clustering method

Kristian Sabo

International Journal of Applied Mathematics and Computer Science (2014)

Volume: 24, Issue: 1, page 151-163
ISSN: 1641-876X

Access Full Article

top

Access to full text

Full (PDF)

Access to full text

Abstract

top

In this paper, we consider the l₁-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l₂-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l₁-clustering algorithm is faster and gives significantly better results than the l₂-clustering algorithm.

How to cite

top

MLA
BibTeX
RIS

Kristian Sabo. "Center-based l₁-clustering method." International Journal of Applied Mathematics and Computer Science 24.1 (2014): 151-163. <http://eudml.org/doc/271908>.

@article{KristianSabo2014,
abstract = {In this paper, we consider the l₁-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l₂-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l₁-clustering algorithm is faster and gives significantly better results than the l₂-clustering algorithm.},
author = {Kristian Sabo},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {l₁-clustering; data mining; optimization; weighted median problem; clustering},
language = {eng},
number = {1},
pages = {151-163},
title = {Center-based l₁-clustering method},
url = {http://eudml.org/doc/271908},
volume = {24},
year = {2014},
}

TY - JOUR
AU - Kristian Sabo
TI - Center-based l₁-clustering method
JO - International Journal of Applied Mathematics and Computer Science
PY - 2014
VL - 24
IS - 1
SP - 151
EP - 163
AB - In this paper, we consider the l₁-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l₂-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l₁-clustering algorithm is faster and gives significantly better results than the l₂-clustering algorithm.
LA - eng
KW - l₁-clustering; data mining; optimization; weighted median problem; clustering
UR - http://eudml.org/doc/271908
ER -

References

top

Angulo, J. and Serra, J. (2007). Modelling and segmentation of colour images in polar representations, Image and Vision Computing 25(4): 475-495.
Äyrämö, S. (2006). Knowledge Mining Using Robust Clustering, Ph.D. thesis, University of Jyväskylä, Jyväskylä.
Bagirov, A.M. and Ugon, J. (2005). An algorithm for minimizing clustering functions, Optimization 54(4-5): 351-368. Zbl1122.90059
Bagirov, A.M., Ugon, J. and Webb, D. (2011). Fast modified global k-means algorithm for incremental cluster construction, Pattern Recognition 44(4): 886-876. Zbl1213.68514
Bezdek, J.C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers, Norwell, MA. Zbl0503.68069
Boyd, D.L. and Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press, Cambridge. Zbl1058.90049
Chaovalitwongse, W.A., Butenko, S. and Pardalos, P.M., (Eds.) (2009). Clustering Challenges in Biological Networks, World Scientific, London.
Choulakian, V. (2001). Robust q-mode principal component analysis in L₁, Computational Statistics & Data Analysis, 37(2): 135-150. Zbl1030.62050
Clarke, F. H., (1990). Optimization and Nonsmooth Analysis, SIAM, Philadelphia, PA. Zbl0696.49002
Cominetti, R. and Michelot, C. (1997). Sufficient conditions for coincidence in l₁-minisum multifacility location problems, Operations Research Letters 20(4): 179-185. Zbl0879.90131
Cord, A., Ambroise, C. and Cocquerez, J.-P. (2006). Feature selection in robust clustering based on Laplace mixture, Pattern Recognition Letters 27(6): 627-635.
Cupec, R., Grbić, R., Sabo, K. and Scitovski, R. (2009). Three points method for searching the best least absolute deviations plane, Applied Mathematics and Computation 215(3): 983-994. Zbl1176.65017
Duda, R., Hart, P. and Stork, D. (2001). Pattern Classification, Wiley, New York, NY. Zbl0968.68140
Finkel, D.E. and Kelley, C.T. (2006). Additive scaling and the DIRECT algorithm, Journal of Global Optimization 36(4): 597-608. Zbl1142.90488
Floudas, C.A. and Gounaris, C.E. (2009). A review of recent advances in global optimization, Journal of Global Optimization 45(4): 3-38. Zbl1180.90245
Frąckiewicz, M. and Palus, H. (2011). KHM clustering techique as a segmentation method for endoscopic colour images, International Journal of Applied Mathematics and Computer Science 21(1): 203-209, DOI: 10.2478/v10006-011-0015-0.
Gan, G., Ma, C. and Wu, J. (2007). Data Clustering: Theory, Algorithms, and Applications, SIAM, Philadelphia, PA. Zbl1185.68274
Grbić, R., Nyarko, E.K. and Scitovski, R. (2012). A modification of the direct method for Lipschitz global optimization for a symmetric function, Journal of Global Optimization, 57(4): 1193-1212, DOI: 10.1007/s10898-012-0020-3. Zbl1279.65076
Grbić, R., Scitovski, K., Sabo, K. and Scitovski, R. (2013). Approximating surfaces by the moving least absolute deviations method, Applied Mathematics and Computation 219(9): 4387-4399. Zbl06447249
Gurwitz, C. (1990). Weighted median algorithms for l₁ approximation, BIT 30(2): 301-310. Zbl0704.65044
Hathaway, R.J. and Bezdek, J.C. (2001). Fuzzy c-means clustering of incomplete data, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 31(5): 735-744.
Hubert, L. and Arabie, P. (1985). Comparing partitions, Journal of Classification 2(1): 193-218. Zbl0587.62128
Jain, A. (2010). 50 years beyond k-means, Pattern Recognition Letters 31(8): 651-666.
Jajuga, K. (1987). A clustering method based on the L₁-norm, Computational Statistics & Data Analysis 5(4): 357-371. Zbl0624.62058
Jajuga, K. (1991). L₁-norm based fuzzy clustering, Fuzzy Sets and Systems 39(1): 43-50. Zbl0714.62052
Iyigun, C. (2007). Probabilistic Distance Clustering, Ph.D. thesis, Graduate School, Rutgers, New Brunswick, NJ.
Jones, D.R., Perttunen, C.D. and Stuckman, B.E. (1993). Lipschitzian optimization without the Lipschitz constant, Journal of Optimization Theory and Applications 79(1): 157-181. Zbl0796.49032
Jörnsten, R. (2004). Clustering and classification based on the L₁ data depth, Journal of Multivariate Analysis 90(1): 67-89. Zbl1047.62064
Kogan, J. (2007). Introduction to Clustering Large and High-Dimensional Data, Cambridge University Press, Cambridge. Zbl1183.62106
Leisch, F. (2006). A toolbox for k-centroids cluster analysis, Computational Statistics & Data Analysis 51(2): 526-544. Zbl1157.62439
Li, X. Hu, W., Wang, H. and Zhang, Z. (2010). Linear discriminant analysis using rotational invariant L₁ norm, Neurocomputing 73(13-15): 2571-2579.
Scitovski, R. and Scitovski, S. (2013). A fast partitioning algorithm and its application to earthquake investigation, Computers and Geosciences 59(1): 124-131.
Simiński, K. (2012). Neuro-rough-fuzzy approach for regression modelling from missing data, International Journal of Applied Mathematics and Computer Science 22(2): 461-476, DOI: 10.2478/v10006-012-0035-4. Zbl1283.93165
Späth, H. (1976). L₁-cluster analysis, Computing 16(4): 379-387. Zbl0322.65008
Späth, H. (1987). Using the L₁-norm within cluster analysis, in Y. Dodge (Ed.), Proceedings of the First International Conference on Statistical Data Analysis Based on the L₁-Norm and Related Methods, University of Neuchatel/Switzerland, August 31-September 04, 1987, Elsevier, Amsterdam, pp. 427-434.
Malinen, M.I. and Fränti, P. (2012). Clustering by analytic functions, Information Sciences 217(1): 31-38.
Meng, D., Zhao, Q and Xu, Z. (2012). Improve robustness of sparse PCA by L₁-norm maximization, Pattern Recognition 45(1): 487-497. Zbl1225.68202
Pintér, J.D. (1996). Global Optimization in Action (Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications), Kluwer Academic Publishers, Dordrecht. Zbl0842.90110
Ruszczynski, A (2006). Nonlinear Optimization, Princeton University Press, Princeton/Oxford, NJ. Zbl1108.90001
Sabo, K. and Scitovski, R. (2008). The best least absolute deviations line-properties and two efficient methods, ANZIAM Journal 50(2): 185-198. Zbl1182.65023
Sabo, K., Scitovski, R. and Vazler, I. (2011). Searching for a best LAD-solution of an overdetermined system of linear equations motivated by searching for a best LAD-hyperplane on the basis of given data, Journal of Optimization Theory and Applications 149(2): 293-314. Zbl1219.90125
Sabo, K., Scitovski, R. and Vazler, I. (2012). One-dimensional center-based l₁-clustering method, Optimization Letters 7(1): 5-22 Zbl1283.90034
Sabo, K., Scitovski, R., Vazler, I. and Zekić-Sušac, M. (2011). Mathematical models of natural gas consumption, Energy Conversion and Management 52(3): 1721-1727.
Teboulle, M. (2007). A unified continuous optimization framework for center-based clustering methods, Journal of Machine Learning Research 8(1): 65-102. Zbl1222.68318
Vardi, Y., Zhang, C. H. (2000). The multivariate L₁-median and associated data depth, Proceedings of the National Academy of Sciences, United States of America 97(4): 1423-1426. Zbl1054.62067
Vazler, I., Sabo, K. and Scitovski, R. (2012). Weighted median of the data in solving least absolute deviations problems, Communications in Statistics-Theory and Methods 41(8): 1455-1465. Zbl1319.62141
Zhang, J., Peng, L., Zhao, X. and Kuruoglu E.E. (2012). Robust data clustering by learning multi-metric $l_{q}$ -norm distances, Expert Systems with Applications 39(1): 335-349.

Citations in EuDML Documents

top

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.