How the result of graph clustering methods depends on the construction of the graph

Markus Maier; Ulrike von Luxburg; Matthias Hein

How the result of graph clustering methods depends on the construction of the graph

Markus Maier; Ulrike von Luxburg; Matthias Hein

ESAIM: Probability and Statistics (2013)

Volume: 17, page 370-418
ISSN: 1292-8100

Access Full Article

top

Access to full text

Abstract

top

We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.

How to cite

top

MLA
BibTeX
RIS

Maier, Markus, von Luxburg, Ulrike, and Hein, Matthias. "How the result of graph clustering methods depends on the construction of the graph." ESAIM: Probability and Statistics 17 (2013): 370-418. <http://eudml.org/doc/273621>.

@article{Maier2013,
abstract = {We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.},
author = {Maier, Markus, von Luxburg, Ulrike, Hein, Matthias},
journal = {ESAIM: Probability and Statistics},
keywords = {random geometric graph; clustering; graph cuts; random geometric graphs},
language = {eng},
pages = {370-418},
publisher = {EDP-Sciences},
title = {How the result of graph clustering methods depends on the construction of the graph},
url = {http://eudml.org/doc/273621},
volume = {17},
year = {2013},
}

TY - JOUR
AU - Maier, Markus
AU - von Luxburg, Ulrike
AU - Hein, Matthias
TI - How the result of graph clustering methods depends on the construction of the graph
JO - ESAIM: Probability and Statistics
PY - 2013
PB - EDP-Sciences
VL - 17
SP - 370
EP - 418
AB - We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one first has to construct a graph on the data points and then apply a graph clustering algorithm to find a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) influences the outcome of the final clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to infinity. It turns out that the limit values of the same objective function are systematically different on different types of graphs. This implies that clustering results systematically depend on the graph and can be very different for different types of graph. We provide examples to illustrate the implications on spectral clustering.
LA - eng
KW - random geometric graph; clustering; graph cuts; random geometric graphs
UR - http://eudml.org/doc/273621
ER -

References

top

[1] D. Angluin and L. Valiant, Fast probabilistic algorithms for Hamiltonian circuits. J. Comput. Syst. Sci.18 (1979) 155–193. Zbl0437.05040 MR532174
[2] G. Biau, B. Cadre and B. Pelletier, A graph-based estimator of the number of clusters. ESAIM: PS 11 (2007) 272–280. Zbl1187.62114 MR2320821
[3] M. Brito, E. Chavez, A. Quiroz and J. Yukich, Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection. Stat. Probab. Lett.35 (1997) 33–42. Zbl0898.62077 MR1467706
[4] S. Bubeck and U. von Luxburg, Nearest neighbor clustering: a baseline method for consistent clustering with arbitrary objective functions. J. Mach. Learn. Res.10 (2009) 657–698. Zbl1235.68134 MR2491753
[5] J.W. Harris and H. Stocker, Handbook of Mathematics and Computational Science. Springer (1998). Zbl0962.00507 MR1621531
[6] W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc.58 (1963) 13–30. Zbl0127.10602 MR144363
[7] D.O. Loftsgaarden and C.P. Quesenberry, A nonparametric estimate of a multivariate density function. Ann. Math. Stat.36 (1965) 1049–1051. Zbl0132.38905 MR176567
[8] M. Maier, M. Hein and U. von Luxburg, Optimal construction of k-nearest neighbor graphs for identifying noisy clusters. Theoret. Comput. Sci.410 (2009) 1749–1764. Zbl1167.68045 MR2514706
[9] M. Maier, U. von Luxburg and M. Hein, Influence of graph construction on graph-based clustering measures, in Advances in Neural Information Processing Systems, vol. 21, edited by D. Koller, D. Schuurmans, Y. Bengio and L. Bottou. MIT Press (2009) 1025–1032.
[10] G. Miller, S. Teng, W. Thurston and S. Vavasis, Separators for sphere-packings and nearest neighbor graphs. J. ACM44 (1997) 1–29. Zbl0883.68100 MR1438463
[11] H. Narayanan, M. Belkin and P. Niyogi, On the relation between low density separation, spectral clustering and graph cuts, in Advances in Neural Information Processing Systems, vol. 19, edited by B. Schölkopf, J. Platt and T. Hoffman. MIT Press (2007) 1025–1032.
[12] A. Srivastav and P. Stangier, Algorithmic Chernoff-Hoeffding inequalities in integer programming. Random Struct. Algorithms8 (1996) 27–58. Zbl0845.68061 MR1368849
[13] U. von Luxburg, A tutorial on spectral clustering. Stat. Comput.17 (2007) 395–416. MR2409803
[14] U. von Luxburg, M. Belkin and O. Bousquet, Consistency of spectral clustering. Ann. Stat.36 (2008) 555–586. Zbl1133.62045 MR2396807

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.