Nearest neighbor classification in infinite dimension

Frédéric Cérou; Arnaud Guyader

ESAIM: Probability and Statistics (2006)

  • Volume: 10, page 340-355
  • ISSN: 1292-8100

Abstract

top
Let X be a random element in a metric space (F,d), and let Y be a random variable with value 0 or 1. Y is called the class, or the label, of X. Let (Xi,Yi)1 ≤ i ≤ n be an observed i.i.d. sample having the same law as (X,Y). The problem of classification is to predict the label of a new random element X. The k-nearest neighbor classifier is the simple following rule: look at the k nearest neighbors of X in the trial sample and choose 0 or 1 for its label according to the majority vote. When ( , d ) = ( d , | | . | | ) , Stone (1977) proved in 1977 the universal consistency of this classifier: its probability of error converges to the Bayes error, whatever the distribution of (X,Y). We show in this paper that this result is no longer valid in general metric spaces. However, if (F,d) is separable and if some regularity condition is assumed, then the k-nearest neighbor classifier is weakly consistent.

How to cite

top

Cérou, Frédéric, and Guyader, Arnaud. "Nearest neighbor classification in infinite dimension ." ESAIM: Probability and Statistics 10 (2006): 340-355. <http://eudml.org/doc/249765>.

@article{Cérou2006,
abstract = { Let X be a random element in a metric space (F,d), and let Y be a random variable with value 0 or 1. Y is called the class, or the label, of X. Let (Xi,Yi)1 ≤ i ≤ n be an observed i.i.d. sample having the same law as (X,Y). The problem of classification is to predict the label of a new random element X. The k-nearest neighbor classifier is the simple following rule: look at the k nearest neighbors of X in the trial sample and choose 0 or 1 for its label according to the majority vote. When $(\{\cal F\},d)=(\mathbb\{R\}^d,||.||)$, Stone (1977) proved in 1977 the universal consistency of this classifier: its probability of error converges to the Bayes error, whatever the distribution of (X,Y). We show in this paper that this result is no longer valid in general metric spaces. However, if (F,d) is separable and if some regularity condition is assumed, then the k-nearest neighbor classifier is weakly consistent. },
author = {Cérou, Frédéric, Guyader, Arnaud},
journal = {ESAIM: Probability and Statistics},
keywords = {Classification; consistency; non parametric statistics.; classification},
language = {eng},
month = {9},
pages = {340-355},
publisher = {EDP Sciences},
title = {Nearest neighbor classification in infinite dimension },
url = {http://eudml.org/doc/249765},
volume = {10},
year = {2006},
}

TY - JOUR
AU - Cérou, Frédéric
AU - Guyader, Arnaud
TI - Nearest neighbor classification in infinite dimension
JO - ESAIM: Probability and Statistics
DA - 2006/9//
PB - EDP Sciences
VL - 10
SP - 340
EP - 355
AB - Let X be a random element in a metric space (F,d), and let Y be a random variable with value 0 or 1. Y is called the class, or the label, of X. Let (Xi,Yi)1 ≤ i ≤ n be an observed i.i.d. sample having the same law as (X,Y). The problem of classification is to predict the label of a new random element X. The k-nearest neighbor classifier is the simple following rule: look at the k nearest neighbors of X in the trial sample and choose 0 or 1 for its label according to the majority vote. When $({\cal F},d)=(\mathbb{R}^d,||.||)$, Stone (1977) proved in 1977 the universal consistency of this classifier: its probability of error converges to the Bayes error, whatever the distribution of (X,Y). We show in this paper that this result is no longer valid in general metric spaces. However, if (F,d) is separable and if some regularity condition is assumed, then the k-nearest neighbor classifier is weakly consistent.
LA - eng
KW - Classification; consistency; non parametric statistics.; classification
UR - http://eudml.org/doc/249765
ER -

References

top
  1. C. Abraham, G. Biau and B. Cadre, On the kernel rule for function classification. submitted (2003).  
  2. G. Biau, F. Bunea and M.H. Wegkamp, On the kernel rule for function classification. IEEE Trans. Inform. Theory, to appear (2005).  
  3. T.M. Cover and P.E. Hart, Nearest neighbor pattern classification. IEEE Trans. Inform. TheoryIT-13 (1967) 21–27.  
  4. S. Dabo-Niang and N. Rhomari, Nonparametric regression estimation when the regressor takes its values in a metric space, submitted (2001).  
  5. L. Devroye, On the almost everywhere convergence of nonparametric regression function estimates. Ann. Statist.9 (1981) 1310–1319.  
  6. L. Devroye, L. Györfi, A. Krzyżak and G. Lugosi, On the strong universal consistency of nearest neighbor regression function estimates. Ann. Statist.22 (1994) 1371–1385.  
  7. L. Devroye, L. Györfi and G. Lugosi, A probabilistic theory of pattern recognition31, Applications of Mathematics (New York). Springer-Verlag, New York (1996).  
  8. L.C. Evans and R.F. Gariepy, Measure theory and fine properties of functions. Studies in Advanced Mathematics. CRC Press, Boca Raton, FL (1992).  
  9. H. Federer, Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften, Band 153. Springer-Verlag New York Inc., New York (1969).  
  10. D. Preiss, Gaussian measures and the density theorem. Comment. Math. Univ. Carolin.22 (1981) 181–193.  
  11. D. Preiss, Dimension of metrics and differentiation of measures, in General topology and its relations to modern analysis and algebra, V (Prague, 1981), Sigma Ser. Pure Math., Heldermann, Berlin 3 (1983) 565–568.  
  12. D. Preiss and J. Tišer, Differentiation of measures on Hilbert spaces, in Measure theory, Oberwolfach 1981 (Oberwolfach, 1981), Springer, Berlin. Lect. Notes Math.945 (1982) 194–207.  
  13. C.J. Stone, Consistent nonparametric regression. Ann. Statist.5 (1977) 595–645. With discussion and a reply by the author.  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.