A Global Approach to the Comparison of Clustering Results

Osvaldo Silva; Helena Bacelar-Nicolau; Fernando C. Nicolau

Biometrical Letters (2012)

  • Volume: 49, Issue: 2, page 135-147
  • ISSN: 1896-3811

Abstract

top
The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initial stage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data.

How to cite

top

Osvaldo Silva, Helena Bacelar-Nicolau, and Fernando C. Nicolau. "A Global Approach to the Comparison of Clustering Results." Biometrical Letters 49.2 (2012): 135-147. <http://eudml.org/doc/268732>.

@article{OsvaldoSilva2012,
abstract = {The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initial stage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data.},
author = {Osvaldo Silva, Helena Bacelar-Nicolau, Fernando C. Nicolau},
journal = {Biometrical Letters},
keywords = {Cluster Analysis; VL Methodology; Affinity Coefficient; Comparing Partitions; Cluster Stability and Cluster Validation},
language = {eng},
number = {2},
pages = {135-147},
title = {A Global Approach to the Comparison of Clustering Results},
url = {http://eudml.org/doc/268732},
volume = {49},
year = {2012},
}

TY - JOUR
AU - Osvaldo Silva
AU - Helena Bacelar-Nicolau
AU - Fernando C. Nicolau
TI - A Global Approach to the Comparison of Clustering Results
JO - Biometrical Letters
PY - 2012
VL - 49
IS - 2
SP - 135
EP - 147
AB - The discovery of knowledge in the case of Hierarchical Cluster Analysis (HCA) depends on many factors, such as the clustering algorithms applied and the strategies developed in the initial stage of Cluster Analysis. We present a global approach for evaluating the quality of clustering results and making a comparison among different clustering algorithms using the relevant information available (e.g. the stability, isolation and homogeneity of the clusters). In addition, we present a visual method to facilitate evaluation of the quality of the partitions, allowing identification of the similarities and differences between partitions, as well as the behaviour of the elements in the partitions. We illustrate our approach using a complex and heterogeneous dataset (real horse data) taken from the literature. We apply HCA based on the generalized affinity coefficient (similarity coefficient) to the case of complex data (symbolic data), combined with 26 (classic and probabilistic) clustering algorithms. Finally, we discuss the obtained results and the contribution of this approach to gaining better knowledge of the structure of data.
LA - eng
KW - Cluster Analysis; VL Methodology; Affinity Coefficient; Comparing Partitions; Cluster Stability and Cluster Validation
UR - http://eudml.org/doc/268732
ER -

References

top
  1. Bacelar-Nicolau H. (1980): Contributions to the Study of Comparison Coefficients in Cluster Analysis, PhD Th. (in Portuguese), Univ. Lisbon. 
  2. Bacelar-Nicolau H. (1988): Two Probabilistic Models for Classification of Variables in Frequency Tables. In: Classification and Related Methods of Data Analysis, H.-H. Bock (ed.), North Holland: Elsevier Sciences Publishers B.V.: 181-186. Zbl0729.62546
  3. Bacelar-Nicolau H. (2000): The Affinity Coefficient. In: Analysis of Symbolic Data Exploratory Methods for Extracting Statistical Information from Complex Data, H.H. Bock, E. Diday (Eds.), Springer: 160-165. Zbl0977.62066
  4. Bacelar-Nicolau H., Nicolau F.C., Sousa A., Bacelar-Nicolau L. (2009): Measuring Similarity of Complex and Heterogeneous Data in Clustering of Large Data Sets. Biocybernetics and Biomedical Engineering 29(2): 9-18. Zbl1286.62060
  5. Bacelar-Nicolau H., Nicolau F.C., Sousa A., Bacelar-Nicolau L. (2010): Clustering Complex Heterogeneous Data Using a Probabilistic Approach. Proceedings of Stochastic Modeling Techniques and Data Analysis International Conference (SMTDA2010), Chania Crete Greece, 8-11 June 2010 - published on the CD Proceedings of SMTDA2010 (electronic publication). Zbl1286.62060
  6. Carvalho F., Souza R. (2009): Unsupervised Pattern Recognition Models for Mixed Feature-Type Symbolic Data. Pattern Recognition Letters 31(5): 430-443.[WoS] 
  7. Gordon A.D. (1999): Classification, 2nd. Chapman &Hall, London. 
  8. Lerman I.C. (1981): Classification et Analyse Ordinale des Données. Dunod, Paris, 1981. Zbl0485.62051
  9. Nicolau F.C. (1983): Cluster Analysis and Distribution Function. Meth. Oper. Res. 45: 431-433. 
  10. Nicolau F.C., Bacelar-Nicolau H. (1998): Some Trends in the Classification of Variables. In: Data Science, Classification, and Related Methods, C. Hayashi, N. Ohsumi, K. Yajima, Y. Tanaka, H. H. Bock, Y. Baba (Eds.), Springer-Verlag: 89-98. Zbl0894.62075
  11. Silva O., Bacelar-Nicolau H., Nicolau F.C. (2010): Global Approach for Evaluating the Quality of Clustering Results. In: Programme and Abstracts CFE 10 & ERCIM 10 (4th CSDA Intern. Conference on Computational and Financial Econometrics and 3rd Conference of the ERCIM Working Group on Computing and Statistics): 40. 
  12. Silva O. (2011): Contributions for Comparing and Evaluating Partitions in Hierarchical Cluster Analysis. PhD. Th. (in Portuguese), Azores University. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.