A complete gradient clustering algorithm formed with kernel estimators

Piotr Kulczycki; Małgorzata Charytanowicz

International Journal of Applied Mathematics and Computer Science (2010)

  • Volume: 20, Issue: 1, page 123-134
  • ISSN: 1641-876X

Abstract

top
The aim of this paper is to provide a gradient clustering algorithm in its complete form, suitable for direct use without requiring a deeper statistical knowledge. The values of all parameters are effectively calculated using optimizing procedures. Moreover, an illustrative analysis of the meaning of particular parameters is shown, followed by the effects resulting from possible modifications with respect to their primarily assigned optimal values. The proposed algorithm does not demand strict assumptions regarding the desired number of clusters, which allows the obtained number to be better suited to a real data structure. Moreover, a feature specific to it is the possibility to influence the proportion between the number of clusters in areas where data elements are dense as opposed to their sparse regions. Finally, the algorithm-by the detection of oneelement clusters-allows identifying atypical elements, which enables their elimination or possible designation to bigger clusters, thus increasing the homogeneity of the data set.

How to cite

top

Piotr Kulczycki, and Małgorzata Charytanowicz. "A complete gradient clustering algorithm formed with kernel estimators." International Journal of Applied Mathematics and Computer Science 20.1 (2010): 123-134. <http://eudml.org/doc/207968>.

@article{PiotrKulczycki2010,
abstract = {The aim of this paper is to provide a gradient clustering algorithm in its complete form, suitable for direct use without requiring a deeper statistical knowledge. The values of all parameters are effectively calculated using optimizing procedures. Moreover, an illustrative analysis of the meaning of particular parameters is shown, followed by the effects resulting from possible modifications with respect to their primarily assigned optimal values. The proposed algorithm does not demand strict assumptions regarding the desired number of clusters, which allows the obtained number to be better suited to a real data structure. Moreover, a feature specific to it is the possibility to influence the proportion between the number of clusters in areas where data elements are dense as opposed to their sparse regions. Finally, the algorithm-by the detection of oneelement clusters-allows identifying atypical elements, which enables their elimination or possible designation to bigger clusters, thus increasing the homogeneity of the data set.},
author = {Piotr Kulczycki, Małgorzata Charytanowicz},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {data analysis and mining; clustering; gradient procedures; nonparametric statistical methods; kernel estimators; numerical calculations},
language = {eng},
number = {1},
pages = {123-134},
title = {A complete gradient clustering algorithm formed with kernel estimators},
url = {http://eudml.org/doc/207968},
volume = {20},
year = {2010},
}

TY - JOUR
AU - Piotr Kulczycki
AU - Małgorzata Charytanowicz
TI - A complete gradient clustering algorithm formed with kernel estimators
JO - International Journal of Applied Mathematics and Computer Science
PY - 2010
VL - 20
IS - 1
SP - 123
EP - 134
AB - The aim of this paper is to provide a gradient clustering algorithm in its complete form, suitable for direct use without requiring a deeper statistical knowledge. The values of all parameters are effectively calculated using optimizing procedures. Moreover, an illustrative analysis of the meaning of particular parameters is shown, followed by the effects resulting from possible modifications with respect to their primarily assigned optimal values. The proposed algorithm does not demand strict assumptions regarding the desired number of clusters, which allows the obtained number to be better suited to a real data structure. Moreover, a feature specific to it is the possibility to influence the proportion between the number of clusters in areas where data elements are dense as opposed to their sparse regions. Finally, the algorithm-by the detection of oneelement clusters-allows identifying atypical elements, which enables their elimination or possible designation to bigger clusters, thus increasing the homogeneity of the data set.
LA - eng
KW - data analysis and mining; clustering; gradient procedures; nonparametric statistical methods; kernel estimators; numerical calculations
UR - http://eudml.org/doc/207968
ER -

References

top
  1. Anderberg, M. R. (1973). Cluster Analysis for Applications, Academic Press, New York, NY. Zbl0299.62029
  2. Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data, Wiley, Chichester. Zbl0801.62001
  3. Carreira-Perpinan, M. A. (2006). Fast nonparametric clustering with gaussian blurring mean-shift, Proceedings of the International Conference on Machine Learning, Pittsburgh, PA, USA, pp. 153-160. 
  4. Cheng, Y. (1995). Mean shift, mode seeking, and clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence 17(8): 790-799. 
  5. Comaniciu, D. and Meer, P. (2002). Mean shift: A robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5): 603-619. 
  6. Daniel, K. (2009). Marketing strategy support method for a cell phone operator, Ph.D. thesis, Systems Research Institute, Polish Academy of Sciences, Warsaw, (in Polish). 
  7. Everitt, B. S., Landau, S. and Leese, M. (2001). Cluster Analysis, Arnold, London. Zbl1205.62076
  8. Fukunaga, K. and Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition, IEEE Transactions on Information Theory 21(1): 32-40. Zbl0297.62025
  9. Girolami, M. and He, C. (2003). Probability density estimation from optimally condensed data samples, IEEE Transactions on Pattern Analysis and Machine Intelligence 25(10): 1253-1264. 
  10. Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ. Zbl0665.62061
  11. Kincaid, D. and Cheney, W. (2002). Numerical Analysis, Brooks/Cole, Pacific Grove, CA. Zbl0877.65002
  12. Kulczycki, P. (2005). Kernel Estimators in Systems Analysis, WNT, Warsaw, (in Polish). 
  13. Kulczycki, P. (2007). Kernel estimators in systems research, in P. Kulczycki, O. Hryniewicz and J. Kacprzyk (Eds), Information Technologies in Systems Research, WNT, Warsaw, pp. 79-105, (in Polish). 
  14. Kulczycki, P. (2008). Kernel estimators in industrial applications, in B. Prasad (Ed.), Soft Computing Applications in Industry, Springer-Verlag, Berlin, pp. 69-91. 
  15. Kulczycki, P. and Charytanowicz, M. (2008). A complete gradient clustering algorithm, in K. Malinowski and L. Rutkowski (Eds), Control and Automation: Current Problems and Their Solutions, EXIT, Warsaw, pp. 312-321, (in Polish). Zbl1300.62043
  16. Kulczycki, P. and Daniel, K. (2009). A method for supporting the marketing strategy of a mobile phone network provider, Przegląd Statystyczny 56(2): 116-134, (in Polish). 
  17. Kulczycki, P. and Łukasik, S. (2009). Reduction of sample dimension and size for synthesis of a statistical fault detection system, in Z. Kowalczuk (Ed.), Systems Detecting, Analysing and Tolerating Faults, PWNT, Gdańsk, pp. 139-146, (in Polish). 
  18. Larose, D. T. (2006). Data Mining Methods and Models, Wiley, New York, NY. Zbl1096.68031
  19. Lubischew, A. A. (1962). On the use of discriminant functions in taxonomy, Biometrics 18(4): 455-478. Zbl0112.11602
  20. Muller, H. G. (1984). Smooth optimum kernel estimators of densities, regression curves and models, The Annals of Statistics 12(2): 766-774. Zbl0543.62031
  21. Pal, S. K. and Mitra, P. (2004). Pattern Recognition Algorithms for Data Mining, Chapman and Hall, London. Zbl1099.68091
  22. Rodriguez, R. and Suarez, A. G. (2006). A new algorithm for image segmentation by using iteratively the mean shift filtering, Scientific Research and Essay 1(2): 43-48. 
  23. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Chapman and Hall, London. Zbl0617.62042
  24. Wand, M. P. and Jones, M. C. (1994). Kernel Smoothing, Chapman and Hall, London. Zbl0854.62043
  25. Wang, W. J., Tan, Y. X., Jiang, J. H., Lu, J. Z., Shen, G. L. and Yu, R. Q. (2004). Clustering based on kernel density estimation: Nearest local maximum searching algorithm, Chemometrics and Intelligent Laboratory Systems 72(1): 1-8. 
  26. Yang, C., Duraiswami, R., DeMenthon, D. and Davis, L. (2003). Mean-shift analysis using quasi-newton methods, Proceedings of the IEEE International Conference on Image Processing, Barcelona, Spain, pp. 447-450. 
  27. Zhang, K., Tang, M. and Kwok, J. T. (2005). Applying neighborhood consistency for fast clustering and kernel density estimation, Proceedings of the IEEE International Conference on Vision and Pattern Recognition, San Diego, CA, USA, pp. 1001-1007. 

Citations in EuDML Documents

top
  1. Witold Andrzejewski, Artur Gramacki, Jarosław Gramacki, Graphics processing units in acceleration of bandwidth selection for kernel density estimation
  2. Adam Nowicki, Michał Grochowski, Kazimierz Duzinkiewicz, Data-driven models for fault detection using kernel PCA: A water distribution system case study
  3. Anna Fabijańska, Tomasz Węgliński, Krzysztof Zakrzewski, Emilia Nowosławska, Assessment of hydrocephalus in children based on digital image processing and analysis
  4. Piotr Kulczycki, Szymon Łukasik, An algorithm for reducing the dimension and size of a sample for data exploration procedures

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.