A graph-based estimator of the number of clusters

Gérard BiauBenoît CadreBruno Pelletier — 2007

ESAIM: Probability and Statistics

Assessing the number of clusters of a statistical population is one of the essential issues of unsupervised learning. Given independent observations drawn from an unknown multivariate probability density , we propose a new approach to estimate the number of connected components, or clusters, of the -level set ( t ) = { x : f ( x ) t } . The basic idea is to form a rough skeleton of the set ( t ) using any preliminary estimator of , and to count the number of connected components of the resulting graph. Under mild analytic...

