Theory of classification : a survey of some recent advances

Stéphane Boucheron; Olivier Bousquet; Gábor Lugosi

ESAIM: Probability and Statistics (2005)

  • Volume: 9, page 323-375
  • ISSN: 1292-8100

Abstract

top
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.

How to cite

top

Boucheron, Stéphane, Bousquet, Olivier, and Lugosi, Gábor. "Theory of classification : a survey of some recent advances." ESAIM: Probability and Statistics 9 (2005): 323-375. <http://eudml.org/doc/244764>.

@article{Boucheron2005,
abstract = {The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.},
author = {Boucheron, Stéphane, Bousquet, Olivier, Lugosi, Gábor},
journal = {ESAIM: Probability and Statistics},
keywords = {pattern recognition; statistical learning theory; concentration inequalities; empirical processes; model selection; Pattern recognition},
language = {eng},
pages = {323-375},
publisher = {EDP-Sciences},
title = {Theory of classification : a survey of some recent advances},
url = {http://eudml.org/doc/244764},
volume = {9},
year = {2005},
}

TY - JOUR
AU - Boucheron, Stéphane
AU - Bousquet, Olivier
AU - Lugosi, Gábor
TI - Theory of classification : a survey of some recent advances
JO - ESAIM: Probability and Statistics
PY - 2005
PB - EDP-Sciences
VL - 9
SP - 323
EP - 375
AB - The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
LA - eng
KW - pattern recognition; statistical learning theory; concentration inequalities; empirical processes; model selection; Pattern recognition
UR - http://eudml.org/doc/244764
ER -

References

top
  1. [1] R. Ahlswede, P. Gács and J. Körner, Bounds on conditional probabilities with applications in multi-user communication. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 34 (1976) 157–177. (correction in 39 (1977) 353–354). Zbl0359.94011
  2. [2] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, The method of potential functions for the problem of restoring the characteristic of a function converter from randomly observed points. Automat. Remote Control 25 (1964) 1546–1556. Zbl0137.39301
  3. [3] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, The probability problem of pattern recognition learning and the method of potential functions. Automat. Remote Control 25 (1964) 1307–1323. Zbl0137.37903
  4. [4] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, Theoretical foundations of the potential function method in pattern recognition learning. Automat. Remote Control 25 (1964) 917–936. Zbl0151.24701
  5. [5] M.A. Aizerman, E.M. Braverman and L.I. Rozonoer, Method of potential functions in the theory of learning machines. Nauka, Moscow (1970). 
  6. [6] H. Akaike, A new look at the statistical model identification. IEEE Trans. Automat. Control 19 (1974) 716–723. Zbl0314.62039
  7. [7] S. Alesker, A remark on the Szarek-Talagrand theorem. Combin. Probab. Comput. 6 (1997) 139–144. Zbl0898.46013
  8. [8] N. Alon, S. Ben-David, N. Cesa-Bianchi and D. Haussler, Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44 (1997) 615–631. Zbl0891.68086
  9. [9] M. Anthony and P.L. Bartlett, Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999). Zbl0968.68126MR1741038
  10. [10] M. Anthony and N. Biggs, Computational Learning Theory. Cambridge Tracts in Theoretical Computer Science (30). Cambridge University Press, Cambridge (1992). Zbl0755.68115MR1159707
  11. [11] M. Anthony and J. Shawe-Taylor, A result of Vapnik with applications. Discrete Appl. Math. 47 (1993) 207–217. Zbl0801.68147
  12. [12] A Antos, L. Devroye and L. Györfi, Lower bounds for Bayes error estimation. IEEE Trans. Pattern Anal. Machine Intelligence 21 (1999) 643–645. 
  13. [13] A. Antos, B. Kégl, T. Linder and G. Lugosi, Data-dependent margin-based generalization bounds for classification. J. Machine Learning Res. 3 (2002) 73–98. Zbl1088.68688
  14. [14] A. Antos and G. Lugosi, Strong minimax lower bounds for learning. Machine Learning 30 (1998) 31–56. Zbl0892.68083
  15. [15] P. Assouad, Densité et dimension. Annales de l’Institut Fourier 33 (1983) 233–282. Zbl0504.60006
  16. [16] J.-Y. Audibert and O. Bousquet, Pac-Bayesian generic chaining, in Advances in Neural Information Processing Systems 16, L. Saul, S. Thrun and B. Schölkopf Eds., Cambridge, Mass., MIT Press (2004). Zbl1222.68136
  17. [17] J.-Y. Audibert, PAC-Bayesian Statistical Learning Theory. Ph.D. Thesis, Université Paris 6, Pierre et Marie Curie (2004). 
  18. [18] K. Azuma, Weighted sums of certain dependent random variables. Tohoku Math. J. 68 (1967) 357–367. Zbl0178.21103
  19. [19] Y. Baraud, Model selection for regression on a fixed design. Probability Theory and Related Fields 117 (2000) 467–493. Zbl0997.62027
  20. [20] A.R. Barron, L. Birgé and P. Massart, Risks bounds for model selection via penalization. Probab. Theory Related Fields 113 (1999) 301–415. Zbl0946.62036
  21. [21] A.R. Barron, Logically smooth density estimation. Technical Report TR 56, Department of Statistics, Stanford University (1985). 
  22. [22] A.R. Barron, Complexity regularization with application to artificial neural networks, in Nonparametric Functional Estimation and Related Topics, G. Roussas Ed. NATO ASI Series, Kluwer Academic Publishers, Dordrecht (1991) 561–576. Zbl0739.62001
  23. [23] A.R. Barron and T.M. Cover, Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 (1991) 1034–1054. Zbl0743.62003
  24. [24] P. Bartlett, S. Boucheron and G. Lugosi, Model selection and error estimation. Machine Learning 48 (2001) 85–113. Zbl0998.68117
  25. [25] P. Bartlett, O. Bousquet and S. Mendelson, Localized Rademacher complexities. Ann. Statist. 33 (2005) 1497–1537. Zbl1083.62034
  26. [26] P.L. Bartlett and S. Ben-David, Hardness results for neural network approximation problems. Theoret. Comput. Sci. 284 (2002) 53–66. Zbl0997.68098
  27. [27] P.L. Bartlett, M.I. Jordan and J.D. McAuliffe, Convexity, classification, and risk bounds. J. Amer. Statis. Assoc., to appear (2005). Zbl1118.62330MR2268032
  28. [28] P.L. Bartlett and W. Maass, Vapnik-Chervonenkis dimension of neural nets, in Handbook Brain Theory Neural Networks, M.A. Arbib Ed. MIT Press, second edition. (2003) 1188–1192. 
  29. [29] P.L. Bartlett and S. Mendelson, Rademacher and gaussian complexities: risk bounds and structural results. J. Machine Learning Res. 3 (2002) 463–482. Zbl1084.68549
  30. [30] P. L. Bartlett, S. Mendelson and P. Philips, Local Complexities for Empirical Risk Minimization, in Proc. of the 17th Annual Conference on Learning Theory (COLT), Springer (2004). Zbl1078.68046MR2177915
  31. [31] O. Bashkirov, E.M. Braverman and I.E. Muchnik, Potential function algorithms for pattern recognition learning machines. Automat. Remote Control 25 (1964) 692–695. Zbl0221.68057
  32. [32] S. Ben-David, N. Eiron and H.-U. Simon, Limitations of learning via embeddings in Euclidean half spaces. J. Machine Learning Res. 3 (2002) 441–461. Zbl1084.68551
  33. [33] G. Bennett, Probability inequalities for the sum of independent random variables. J. Amer. Statis. Assoc. 57 (1962) 33–45. Zbl0104.11905
  34. [34] S.N. Bernstein, The Theory of Probabilities. Gostehizdat Publishing House, Moscow (1946). 
  35. [35] L. Birgé, An alternative point of view on Lepski’s method, in State of the art in probability and statistics (Leiden, 1999), Inst. Math. Statist., Beachwood, OH, IMS Lecture Notes Monogr. Ser. 36 (2001) 113–133. 
  36. [36] L. Birgé and P. Massart, Rates of convergence for minimum contrast estimators. Probab. Theory Related Fields 97 (1993) 113–150. Zbl0805.62037
  37. [37] L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, E. Torgersen D. Pollard and G. Yang Eds., Springer, New York (1997) 55–87. Zbl0920.62042
  38. [38] L. Birgé and P. Massart, Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 (1998) 329–375. Zbl0954.62033
  39. [39] G. Blanchard, O. Bousquet and P. Massart, Statistical performance of support vector machines. Ann. Statist., to appear (2006). Zbl1133.62044MR2396805
  40. [40] G. Blanchard, G. Lugosi and N. Vayatis, On the rates of convergence of regularized boosting classifiers. J. Machine Learning Res. 4 (2003) 861–894. Zbl1083.68109
  41. [41] A. Blumer, A. Ehrenfeucht, D. Haussler and M.K. Warmuth, Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36 (1989) 929–965. Zbl0697.68079
  42. [42] S. Bobkov and M. Ledoux, Poincaré’s inequalities and Talagrands’s concentration phenomenon for the exponential distribution. Probab. Theory Related Fields 107 (1997) 383–400. Zbl0878.60014
  43. [43] B. Boser, I. Guyon and V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proc. of the Fifth Annual ACM Workshop on Computational Learning Theory (COLT). Association for Computing Machinery, New York, NY (1992) 144–152. 
  44. [44] S. Boucheron, O. Bousquet, G. Lugosi and P. Massart, Moment inequalities for functions of independent random variables. Ann. Probab. 33 (2005) 514–560. Zbl1074.60018
  45. [45] S. Boucheron, G. Lugosi and P. Massart, A sharp concentration inequality with applications. Random Structures Algorithms 16 (2000) 277–292. Zbl0954.60008
  46. [46] S. Boucheron, G. Lugosi and P. Massart, Concentration inequalities using the entropy method. Ann. Probab. 31 (2003) 1583–1614. Zbl1051.60020
  47. [47] O. Bousquet, A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Acad. Sci. Paris 334 (2002) 495–500. Zbl1001.60021
  48. [48] O. Bousquet, Concentration inequalities for sub-additive functions using the entropy method, in Stochastic Inequalities and Applications, C. Houdré E. Giné and D. Nualart Eds., Birkhauser (2003). Zbl1037.60015MR2073435
  49. [49] O. Bousquet and A. Elisseeff, Stability and generalization. J. Machine Learning Res. 2 (2002) 499–526. Zbl1007.68083
  50. [50] O. Bousquet, V. Koltchinskii and D. Panchenko, Some local measures of complexity of convex hulls and generalization bounds, in Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT), Springer (2002) 59–73. Zbl1050.68055
  51. [51] L. Breiman, Arcing classifiers. Ann. Statist. 26 (1998) 801–849. Zbl0934.62064
  52. [52] L. Breiman, Some infinite theory for predictor ensembles. Ann. Statist. 32 (2004) 1–11. Zbl1105.62308
  53. [53] L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees. Wadsworth International, Belmont, CA (1984). Zbl0541.62042MR726392
  54. [54] P. Bühlmann and B. Yu, Boosting with the l 2 -loss: Regression and classification. J. Amer. Statis. Assoc. 98 (2004) 324–339. Zbl1041.62029
  55. [55] A. Cannon, J.M. Ettinger, D. Hush and C. Scovel, Machine learning with data dependent hypothesis classes. J. Machine Learning Res. 2 (2002) 335–358. Zbl1007.68156
  56. [56] G. Castellan, Density estimation via exponential model selection. IEEE Trans. Inform. Theory 49 (2003) 2052–2060. Zbl1288.62054
  57. [57] O. Catoni, Randomized estimators and empirical complexity for pattern recognition and least square regression. Preprint PMA-677. 
  58. [58] O. Catoni, Statistical learning theory and stochastic optimization. École d’été de Probabilités de Saint-Flour XXXI. Springer-Verlag. Lect. Notes Math. 1851 (2004). Zbl1076.93002
  59. [59] O. Catoni, Localized empirical complexity bounds and randomized estimators (2003). Preprint. 
  60. [60] N. Cesa-Bianchi and D. Haussler, A graph-theoretic generalization of the Sauer-Shelah lemma. Discrete Appl. Math. 86 (1998) 27–35. Zbl0918.05066
  61. [61] M. Collins, R.E. Schapire and Y. Singer, Logistic regression, AdaBoost and Bregman distances. Machine Learning 48 (2002) 253–285. Zbl0998.68123
  62. [62] C. Cortes and V.N. Vapnik, Support vector networks. Machine Learning 20 (1995) 1–25. Zbl0831.68098
  63. [63] T.M. Cover, Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. Electronic Comput. 14 (1965) 326–334. Zbl0152.18206
  64. [64] P. Craven and G. Wahba, Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 (1979) 377–403. Zbl0377.65007
  65. [65] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods. Cambridge University Press, Cambridge, UK (2000). Zbl0994.68074
  66. [66] I. Csiszár, Large-scale typicality of Markov sample paths and consistency of MDL order estimators. IEEE Trans. Inform. Theory 48 (2002) 1616–1628. Zbl1060.62092
  67. [67] I. Csiszár and P. Shields, The consistency of the BIC Markov order estimator. Ann. Statist. 28 (2000) 1601–1619. Zbl1105.62311
  68. [68] F. Cucker and S. Smale, On the mathematical foundations of learning. Bull. Amer. Math. Soc. (2002) 1–50. Zbl0983.68162
  69. [69] A. Dembo, Information inequalities and concentration of measure. Ann. Probab. 25 (1997) 927–939. Zbl0880.60018
  70. [70] P.A. Devijver and J. Kittler, Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs, NJ (1982). Zbl0542.68071MR692767
  71. [71] L. Devroye, Automatic pattern recognition: A study of the probability of error. IEEE Trans. Pattern Anal. Machine Intelligence 10 (1988) 530–543. Zbl0661.62056
  72. [72] L. Devroye, L. Györfi and G. Lugosi, A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York (1996). Zbl0853.68150MR1383093
  73. [73] L. Devroye and G. Lugosi, Lower bounds in pattern recognition and learning. Pattern Recognition 28 (1995) 1011–1018. 
  74. [74] L. Devroye and T. Wagner, Distribution-free inequalities for the deleted and holdout error estimates. IEEE Trans. Inform. Theory 25(2) (1979) 202–207. Zbl0408.62055
  75. [75] L. Devroye and T. Wagner, Distribution-free performance bounds for potential function rules. IEEE Trans. Inform. Theory 25(5) (1979) 601–604. Zbl0432.62040
  76. [76] D.L. Donoho and I.M. Johnstone, Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3) (1994) 425–455. Zbl0815.62019
  77. [77] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis. John Wiley, New York (1973). Zbl0277.68056
  78. [78] R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification. John Wiley and Sons (2000). Zbl0968.68140MR1802993
  79. [79] R.M. Dudley, Central limit theorems for empirical measures. Ann. Probab. 6 (1978) 899–929. Zbl0404.60016
  80. [80] R.M. Dudley, Balls in R k do not cut all subsets of k + 2 points. Advances Math. 31 (3) (1979) 306–308. Zbl0408.05001
  81. [81] R.M. Dudley, Empirical processes, in École de Probabilité de St. Flour 1982. Lect. Notes Math. 1097 (1984). Zbl0554.60029
  82. [82] R.M. Dudley, Universal Donsker classes and metric entropy. Ann. Probab. 15 (1987) 1306–1326. Zbl0631.60004
  83. [83] R.M. Dudley, Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999). Zbl0951.60033MR1720712
  84. [84] R.M. Dudley, E. Giné and J. Zinn, Uniform and universal Glivenko-Cantelli classes. J. Theoret. Probab. 4 (1991) 485–510. Zbl0732.60035
  85. [85] B. Efron, Bootstrap methods: another look at the jackknife. Ann. Statist. 7 (1979) 1–26. Zbl0406.62024
  86. [86] B. Efron, The jackknife, the bootstrap, and other resampling plans. SIAM, Philadelphia (1982). Zbl0496.62036MR659849
  87. [87] B. Efron and R.J. Tibshirani, An Introduction to the Bootstrap. Chapman and Hall, New York (1994). Zbl0835.62038MR1270903
  88. [88] A. Ehrenfeucht, D. Haussler, M. Kearns and L. Valiant, A general lower bound on the number of examples needed for learning. Inform. Comput. 82 (1989) 247–261. Zbl0679.68158
  89. [89] T. Evgeniou, M. Pontil and T. Poggio, Regularization networks and support vector machines, in Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett B. Schölkopf and D. Schuurmans, Eds., Cambridge, MA, MIT Press. (2000) 171–203. 
  90. [90] P. Frankl, On the trace of finite sets. J. Combin. Theory, Ser. A 34 (1983) 41–45. Zbl0505.05001
  91. [91] Y. Freund, Boosting a weak learning algorithm by majority. Inform. Comput. 121 (1995) 256–285. Zbl0833.68109
  92. [92] Y. Freund, Self bounding learning algorithms, in Proceedings of the 11th Annual Conference on Computational Learning Theory (1998) 127–135. 
  93. [93] Y. Freund, Y. Mansour and R.E. Schapire, Generalization bounds for averaged classifiers (how to be a Bayesian without believing). Ann. Statist. (2004). Zbl1045.62056MR2089139
  94. [94] Y. Freund and R. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55 (1997) 119–139. Zbl0880.68103
  95. [95] J. Friedman, T. Hastie and R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Statist. 28 (2000) 337–374. Zbl1106.62323
  96. [96] M. Fromont, Some problems related to model selection: adaptive tests and bootstrap calibration of penalties. Thèse de doctorat, Université Paris-Sud (December 2003). 
  97. [97] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, New York (1972). Zbl0711.62052MR1075415
  98. [98] E. Giné, Empirical processes and applications: an overview. Bernoulli 2 (1996) 1–28. Zbl0849.62026
  99. [99] E. Giné and J. Zinn, Some limit theorems for empirical processes. Ann. Probab. 12 (1984) 929–989. Zbl0553.60037
  100. [100] E. Giné, Lectures on some aspects of the bootstrap, in Lectures on probability theory and statistics (Saint-Flour, 1996). Lect. Notes Math. 1665 (1997) 37–151. Zbl0882.62040
  101. [101] P. Goldberg and M. Jerrum, Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by real numbers. Machine Learning 18 (1995) 131–148. Zbl0831.68087
  102. [102] U. Grenander, Abstract inference. John Wiley & Sons Inc., New York (1981). Zbl0505.62069MR599175
  103. [103] P. Hall, Large sample optimality of least squares cross-validation in density estimation. Ann. Statist. 11 (1983) 1156–1174. Zbl0599.62051
  104. [104] T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning. Springer Series in Statistics. Springer-Verlag, New York (2001). Zbl0973.62007MR1851606
  105. [105] D. Haussler, Decision theoretic generalizations of the pac model for neural nets and other learning applications. Inform. Comput. 100 (1992) 78–150. Zbl0762.68050
  106. [106] D. Haussler, Sphere packing numbers for subsets of the boolean n -cube with bounded Vapnik-Chervonenkis dimension. J. Combin. Theory, Ser. A 69 (1995) 217–232. Zbl0818.60005
  107. [107] D. Haussler, N. Littlestone and M. Warmuth, Predicting { 0 , 1 } functions from randomly drawn points, in Proc. of the 29th IEEE Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA (1988) 100–109. 
  108. [108] R. Herbrich and R.C. Williamson, Algorithmic luckiness. J. Machine Learning Res. 3 (2003) 175–212. Zbl1088.68151
  109. [109] W. Hoeffding, Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 (1963) 13–30. Zbl0127.10602
  110. [110] P. Huber, The behavior of the maximum likelihood estimates under non-standard conditions, in Proc. Fifth Berkeley Symposium on Probability and Mathematical Statistics, Univ. California Press (1967) 221–233. Zbl0212.21504
  111. [111] W. Jiang, Process consistency for adaboost. Ann. Statist. 32 (2004) 13–29. Zbl1105.62316
  112. [112] D.S. Johnson and F.P. Preparata, The densest hemisphere problem. Theoret. Comput. Sci. 6 (1978) 93–107. Zbl0368.68053
  113. [113] I. Johnstone, Function estimation and gaussian sequence models. Technical Report. Department of Statistics, Stanford University (2002). 
  114. [114] M. Karpinski and A. Macintyre, Polynomial bounds for vc dimension of sigmoidal and general pfaffian neural networks. J. Comput. Syst. Sci. 54 (1997). Zbl0869.68088MR1463257
  115. [115] M. Kearns, Y. Mansour, A.Y. Ng and D. Ron, An experimental and theoretical comparison of model selection methods, in Proc. of the Eighth Annual ACM Workshop on Computational Learning Theory, Association for Computing Machinery, New York (1995) 21–30. 
  116. [116] M.J. Kearns and D. Ron, Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput. 11(6) (1999) 1427–1453. 
  117. [117] M.J. Kearns and U.V. Vazirani, An Introduction to Computational Learning Theory. MIT Press, Cambridge, Massachusetts (1994). MR1331838
  118. [118] A.G. Khovanskii, Fewnomials. Translations of Mathematical Monographs 88, American Mathematical Society (1991). Zbl0728.12002MR1108621
  119. [119] J.C. Kieffer, Strongly consistent code-based identification and order estimation for constrained finite-state model classes. IEEE Trans. Inform. Theory 39 (1993) 893–902. Zbl0784.94005
  120. [120] G.S. Kimeldorf and G. Wahba, A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 (1970) 495–502. Zbl0193.45201
  121. [121] P. Koiran and E.D. Sontag, Neural networks with quadratic vc dimension. J. Comput. Syst. Sci. 54 (1997). Zbl0869.68089MR1463259
  122. [122] A.N. Kolmogorov, On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR 114 (1957) 953–956. Zbl0090.27103
  123. [123] A.N. Kolmogorov and V.M. Tikhomirov, ε -entropy and ε -capacity of sets in functional spaces. Amer. Math. Soc. Transl., Ser. 2 17 (1961) 277–364. Zbl0133.06703
  124. [124] V. Koltchinskii, Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 (2001) 1902–1914. Zbl1008.62614
  125. [125] V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization. Manuscript (September 2003). Zbl1118.62065
  126. [126] V. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning, in High Dimensional Probability II, E. Giné, D.M. Mason and J.A. Wellner, Eds. (2000) 443–459. Zbl1106.68385
  127. [127] V. Koltchinskii and D. Panchenko, Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 (2002). Zbl1012.62004MR1892654
  128. [128] S. Kulkarni, G. Lugosi and S. Venkatesh, Learning pattern classification – a survey. IEEE Trans. Inform. Theory 44 (1998) 2178–2206. Information Theory: 1948–1998. Commemorative special issue. Zbl0935.68093
  129. [129] S. Kutin and P. Niyogi, Almost-everywhere algorithmic stability and generalization error, in UAI-2002: Uncertainty in Artificial Intelligence (2002). 
  130. [130] J. Langford and M. Seeger, Bounds for averaging classifiers. CMU-CS 01-102, Carnegie Mellon University (2001). 
  131. [131] M. Ledoux, Isoperimetry and gaussian analysis in Lectures on Probability Theory and Statistics, P. Bernard Ed., École d’Été de Probabilités de St-Flour XXIV-1994 (1996) 165–294. Zbl0874.60005
  132. [132] M. Ledoux, On Talagrand’s deviation inequalities for product measures. ESAIM: PS 1 (1997) 63–87. Zbl0869.60013
  133. [133] M. Ledoux and M. Talagrand, Probability in Banach Space. Springer-Verlag, New York (1991). Zbl0748.60004MR1102015
  134. [134] W.S. Lee, P.L. Bartlett and R.C. Williamson, The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory 44 (1998) 1974–1980. Zbl0935.68091
  135. [135] O.V. Lepskiĭ, E. Mammen and V.G. Spokoiny, Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 (1997) 929–947. Zbl0885.62044
  136. [136] O.V. Lepskiĭ, A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. i Primenen. 35 (1990) 459–470. Zbl0725.62075
  137. [137] O.V. Lepskiĭ, Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor. Veroyatnost. i Primenen. 36 (1991) 645–659. Zbl0738.62045
  138. [138] Y. Li, P.M. Long and A. Srinivasan, Improved bounds on the sample complexity of learning. J. Comput. Syst. Sci. 62 (2001) 516–527. Zbl0990.68081
  139. [139] Y. Lin, A note on margin-based loss functions in classification. Technical Report 1029r, Department of Statistics, University Wisconsin, Madison (1999). Zbl1058.62052
  140. [140] Y. Lin, Some asymptotic properties of the support vector machine. Technical Report 1044r, Department of Statistics, University of Wisconsin, Madison (1999). 
  141. [141] Y. Lin, Support vector machines and the bayes rule in classification. Data Mining and Knowledge Discovery 6 (2002) 259–275. 
  142. [142] F. Lozano, Model selection using Rademacher penalization, in Proceedings of the Second ICSC Symposia on Neural Computation (NC2000). ICSC Adademic Press (2000). 
  143. [143] M.J. Luczak and C. McDiarmid, Concentration for locally acting permutations. Discrete Math. 265 (2003) 159–171. Zbl1025.60003
  144. [144] G. Lugosi, Pattern classification and learning theory, in Principles of Nonparametric Learning, L. Györfi Ed., Springer, Wien (2002) 5–62. 
  145. [145] G. Lugosi and A. Nobel, Adaptive model selection using empirical complexities. Ann. Statist. 27 (1999) 1830–1864. Zbl0962.62034
  146. [146] G. Lugosi and N. Vayatis, On the Bayes-risk consistency of regularized boosting methods. Ann. Statist. 32 (2004) 30–55. Zbl1105.62319
  147. [147] G. Lugosi and M. Wegkamp, Complexity regularization via localized random penalties. Ann. Statist. 2 (2004) 1679–1697. Zbl1045.62060
  148. [148] G. Lugosi and K. Zeger, Concept learning using complexity regularization. IEEE Trans. Inform. Theory 42 (1996) 48–54. Zbl0844.62006
  149. [149] A. Macintyre and E.D. Sontag, Finiteness results for sigmoidal “neural” networks, in Proc. of the 25th Annual ACM Symposium on the Theory of Computing, Association of Computing Machinery, New York (1993) 325–334. Zbl1310.68077
  150. [150] C.L. Mallows, Some comments on C p . Technometrics 15 (1997) 661–675. Zbl0269.62061
  151. [151] E. Mammen and A. Tsybakov, Smooth discrimination analysis. Ann. Statist. 27(6) (1999) 1808–1829. Zbl0961.62058
  152. [152] S. Mannor and R. Meir, Weak learners and improved convergence rate in boosting, in Advances in Neural Information Processing Systems 13: Proc. NIPS’2000 (2001). Zbl0992.68093
  153. [153] S. Mannor, R. Meir and T. Zhang, The consistency of greedy algorithms for classification, in Proceedings of the 15th Annual Conference on Computational Learning Theory (2002). Zbl1050.68581MR2040422
  154. [154] K. Marton, A simple proof of the blowing-up lemma. IEEE Trans. Inform. Theory 32 (1986) 445–446. Zbl0594.94003
  155. [155] K. Marton, Bounding d ¯ -distance by informational divergence: a way to prove measure concentration. Ann. Probab. 24 (1996) 857–866. Zbl0865.60017
  156. [156] K. Marton, A measure concentration inequality for contracting Markov chains. Geometric Functional Analysis 6 (1996) 556–571. Erratum: 7 (1997) 609–613. Zbl0895.60073
  157. [157] L. Mason, J. Baxter, P.L. Bartlett and M. Frean, Functional gradient techniques for combining hypotheses, in Advances in Large Margin Classifiers, A.J. Smola, P.L. Bartlett, B. Schölkopf and D. Schuurmans Eds., MIT Press, Cambridge, MA (1999) 221–247. 
  158. [158] P. Massart, Optimal constants for Hoeffding type inequalities. Technical report, Mathematiques, Université de Paris-Sud, Report 98.86, 1998. 
  159. [159] P. Massart, About the constants in Talagrand’s concentration inequalities for empirical processes. Ann. Probab. 28 (2000) 863–884. Zbl1140.60310
  160. [160] P. Massart, Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse IX (2000) 245–303. Zbl0986.62002
  161. [161] P. Massart, École d’Eté de Probabilité de Saint-Flour XXXIII, chapter Concentration inequalities and model selection, LNM. Springer-Verlag (2003). Zbl1170.60006
  162. [162] P. Massart and E. Nédélec, Risk bounds for statistical learning, Ann. Statist., to appear. Zbl1108.62007MR2291502
  163. [163] D.A. McAllester, Some pac-Bayesian theorems, in Proc. of the 11th Annual Conference on Computational Learning Theory, ACM Press (1998) 230–234. 
  164. [164] D.A. McAllester, pac-Bayesian model averaging, in Proc. of the 12th Annual Conference on Computational Learning Theory. ACM Press (1999). MR1811612
  165. [165] D.A. McAllester, pac -Bayesian stochastic model selection. Machine Learning 51 (2003) 5–21. Zbl1056.68122
  166. [166] C. McDiarmid, On the method of bounded differences, in Surveys in Combinatorics 1989, Cambridge University Press, Cambridge (1989) 148–188. Zbl0712.05012
  167. [167] C. McDiarmid, Concentration, in Probabilistic Methods for Algorithmic Discrete Mathematics, M. Habib, C. McDiarmid, J. Ramirez-Alfonsin and B. Reed Eds., Springer, New York (1998) 195–248. Zbl0927.60027
  168. [168] C. McDiarmid, Concentration for independent permutations. Combin. Probab. Comput. 2 (2002) 163–178. Zbl1001.60014
  169. [169] G.J. McLachlan, Discriminant Analysis and Statistical Pattern Recognition. John Wiley, New York (1992). Zbl1108.62317MR1190469
  170. [170] S. Mendelson, Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 (2002) 1977–1991. Zbl1061.68128
  171. [171] S. Mendelson, A few notes on statistical learning theory, in Advanced Lectures in Machine Learning. Lect. Notes Comput. Sci. 2600, S. Mendelson and A. Smola Eds., Springer (2003) 1–40. Zbl1019.68093
  172. [172] S. Mendelson and P. Philips, On the importance of “small” coordinate projections. J. Machine Learning Res. 5 (2004) 219–238. Zbl1222.68264
  173. [173] S. Mendelson and R. Vershynin, Entropy and the combinatorial dimension. Inventiones Mathematicae 152 (2003) 37–55. Zbl1039.60016
  174. [174] V. Milman and G. Schechman, Asymptotic theory of finite-dimensional normed spaces, Springer-Verlag, New York (1986). MR856576
  175. [175] B.K. Natarajan, Machine Learning: A Theoretical Approach, Morgan Kaufmann, San Mateo, CA (1991). MR1137519
  176. [176] D. Panchenko, A note on Talagrand’s concentration inequality. Electron. Comm. Probab. 6 (2001). Zbl0977.60008
  177. [177] D. Panchenko, Some extensions of an inequality of Vapnik and Chervonenkis. Electron. Comm. Probab. 7 (2002). Zbl1008.60035MR1887174
  178. [178] D. Panchenko, Symmetrization approach to concentration inequalities for empirical processes. Ann. Probab. 31 (2003) 2068–2081. Zbl1042.60008
  179. [179] T. Poggio, S. Rifkin, S. Mukherjee and P. Niyogi, General conditions for predictivity in learning theory. Nature 428 (2004) 419–422. 
  180. [180] D. Pollard, Convergence of Stochastic Processes, Springer-Verlag, New York (1984). Zbl0544.60045MR762984
  181. [181] D. Pollard, Uniform ratio limit theorems for empirical processes. Scand. J. Statist. 22 (1995) 271–278. Zbl0835.62051
  182. [182] W. Polonik, Measuring mass concentrations and estimating density contour clusters–an excess mass approach. Ann. Statist. 23(3) (1995) 855–881. Zbl0841.62045
  183. [183] E. Rio, Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Related Fields 119 (2001) 163–175. Zbl0976.60033
  184. [184] E. Rio, Une inegalité de Bennett pour les maxima de processus empiriques, in Colloque en l’honneur de J. Bretagnolle, D. Dacunha-Castelle et I. Ibragimov, Annales de l’Institut Henri Poincaré (2001). Zbl1014.60011
  185. [185] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press (1996). Zbl0853.62046MR1438788
  186. [186] W.H. Rogers and T.J. Wagner, A finite sample distribution-free performance bound for local discrimination rules. Ann. Statist. 6 (1978) 506–514. Zbl0385.62041
  187. [187] M. Rudelson, R. Vershynin, Combinatorics of random processes and sections of convex bodies. Ann. Math, to appear (2004). Zbl1114.60009MR2247969
  188. [188] N. Sauer, On the density of families of sets. J. Combin. Theory, Ser A 13 (1972) 145–147. Zbl0248.05005
  189. [189] R.E. Schapire, The strength of weak learnability. Machine Learning 5 (1990) 197–227. Zbl0747.68058
  190. [190] R.E. Schapire, Y. Freund, P. Bartlett and W.S. Lee, Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26 (1998) 1651–1686. Zbl0929.62069
  191. [191] B. Schölkopf and A. J. Smola, Learning with Kernels. MIT Press, Cambridge, MA (2002). Zbl1019.68094
  192. [192] D. Schuurmans, Characterizing rational versus exponential learning curves, in Computational Learning Theory: Second European Conference. EuroCOLT’95, Springer-Verlag (1995) 272–286. 
  193. [193] C. Scovel and I. Steinwart, Fast rates for support vector machines. Los Alamos National Laboratory Technical Report LA-UR 03-9117 (2003). 
  194. [194] M. Seeger, PAC-Bayesian generalisation error bounds for gaussian process classification. J. Machine Learning Res. 3 (2002) 233–269. Zbl1088.68745
  195. [195] J. Shawe-Taylor, P.L. Bartlett, R.C. Williamson and M. Anthony, Structural risk minimization over data-dependent hierarchies. IEEE Trans. Inform. Theory 44 (1998) 1926–1940. Zbl0935.68090
  196. [196] S. Shelah, A combinatorial problem: Stability and order for models and theories in infinity languages. Pacific J. Mathematics 41 (1972) 247–261. Zbl0239.02024
  197. [197] G.R. Shorack and J. Wellner, Empirical Processes with Applications in Statistics. Wiley, New York (1986). Zbl1170.62365MR838963
  198. [198] H.U. Simon, General lower bounds on the number of examples needed for learning probabilistic concepts, in Proc. of the Sixth Annual ACM Conference on Computational Learning Theory, Association for Computing Machinery, New York (1993) 402–412. 
  199. [199] A.J. Smola, P.L. Bartlett, B. Schölkopf and D. Schuurmans Eds, Advances in Large Margin Classifiers. MIT Press, Cambridge, MA (2000). Zbl0988.68145MR1820960
  200. [200] A.J. Smola, B. Schölkopf and K.-R. Müller, The connection between regularization operators and support vector kernels. Neural Networks 11 (1998) 637–649. 
  201. [201] D.F. Specht, Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification. IEEE Trans. Neural Networks 1 (1990) 111–121. 
  202. [202] J.M. Steele, Existence of submatrices with all possible columns. J. Combin. Theory, Ser. A 28 (1978) 84–88. Zbl0373.05004
  203. [203] I. Steinwart, On the influence of the kernel on the consistency of support vector machines. J. Machine Learning Res. (2001) 67–93. Zbl1009.68143
  204. [204] I. Steinwart, Consistency of support vector machines and other regularized kernel machines. IEEE Trans. Inform. Theory 51 (2005) 128–142. Zbl1304.62090
  205. [205] I. Steinwart, Support vector machines are universally consistent. J. Complexity 18 (2002) 768–791. Zbl1030.68074
  206. [206] I. Steinwart, On the optimal parameter choice in ν -support vector machines. IEEE Trans. Pattern Anal. Machine Intelligence 25 (2003) 1274–1284. 
  207. [207] I. Steinwart, Sparseness of support vector machines. J. Machine Learning Res. 4 (2003) 1071–1105. Zbl1094.68082
  208. [208] S.J. Szarek and M. Talagrand, On the convexified Sauer-Shelah theorem. J. Combin. Theory, Ser. B 69 (1997) 183–192. Zbl0868.05051
  209. [209] M. Talagrand, The Glivenko-Cantelli problem. Ann. Probab. 15 (1987) 837–870. Zbl0632.60024
  210. [210] M. Talagrand, Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 (1994) 28–76. Zbl0798.60051
  211. [211] M. Talagrand, Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’I.H.E.S. 81 (1995) 73–205. Zbl0864.60013
  212. [212] M. Talagrand, The Glivenko-Cantelli problem, ten years later. J. Theoret. Probab. 9 (1996) 371–384. Zbl0880.60023
  213. [213] M. Talagrand, Majorizing measures: the generic chaining. Ann. Probab. 24 (1996) 1049–1103. (Special Invited Paper). Zbl0867.60017
  214. [214] M. Talagrand, New concentration inequalities in product spaces. Inventiones Mathematicae 126 (1996) 505–563. Zbl0893.60001
  215. [215] M. Talagrand, A new look at independence. Ann. Probab. 24 (1996) 1–34. (Special Invited Paper). Zbl0858.60019
  216. [216] M. Talagrand, Vapnik-Chervonenkis type conditions and uniform Donsker classes of functions. Ann. Probab. 31 (2003) 1565–1582. Zbl1044.60027
  217. [217] M. Talagrand, The generic chaining: upper and lower bounds for stochastic processes. Springer-Verlag, New York (2005). Zbl1075.60001MR2133757
  218. [218] A. Tsybakov. On nonparametric estimation of density level sets. Ann. Stat. 25 (1997) 948–969. Zbl0881.62039
  219. [219] A.B. Tsybakov, Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 (2004) 135–166. Zbl1105.62353
  220. [220] A.B. Tsybakov, Introduction à l’estimation non-paramétrique. Springer (2004). Zbl1029.62034
  221. [221] A. Tsybakov and S. van de Geer, Square root penalty: adaptation to the margin in classification and in edge estimation. Ann. Statist., to appear (2005). Zbl1080.62047MR2195633
  222. [222] S. Van de Geer, A new approach to least-squares estimation, with applications. Ann. Statist. 15 (1987) 587–602. Zbl0625.62046
  223. [223] S. Van de Geer, Estimating a regression function. Ann. Statist. 18 (1990) 907–924. Zbl0709.62040
  224. [224] S. van de Geer, Empirical Processes in M-Estimation. Cambridge University Press, Cambridge, UK (2000). Zbl1179.62073MR1739079
  225. [225] A.W. van der Waart and J.A. Wellner, Weak convergence and empirical processes. Springer-Verlag, New York (1996). Zbl0862.60002MR1385671
  226. [226] V. Vapnik and A. Lerner, Pattern recognition using generalized portrait method. Automat. Remote Control 24 (1963) 774–780. 
  227. [227] V.N. Vapnik, Estimation of Dependencies Based on Empirical Data. Springer-Verlag, New York (1982). Zbl0499.62005MR672244
  228. [228] V.N. Vapnik, The Nature of Statistical Learning Theory. Springer-Verlag, New York (1995). Zbl0833.62008MR1367965
  229. [229] V.N. Vapnik, Statistical Learning Theory. John Wiley, New York (1998). Zbl0935.62007MR1641250
  230. [230] V.N. Vapnik and A.Ya. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 (1971) 264–280. Zbl0247.60005
  231. [231] V.N. Vapnik and A.Ya. Chervonenkis, Theory of Pattern Recognition. Nauka, Moscow (1974). (in Russian); German translation: Theorie der Zeichenerkennung, Akademie Verlag, Berlin (1979). MR474638
  232. [232] V.N. Vapnik and A.Ya. Chervonenkis, Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory Probab. Appl. 26 (1981) 821–832. Zbl0487.60036
  233. [233] M. Vidyasagar, A Theory of Learning and Generalization. Springer, New York (1997). Zbl0928.68061MR1482231
  234. [234] V. Vu, On the infeasibility of training neural networks with small mean squared error. IEEE Trans. Inform. Theory 44 (1998) 2892–2900. Zbl0981.68138
  235. [235] M. Wegkamp, Model selection in nonparametric regression. Ann. Statist. 31(1) (2003) 252–273. Zbl1019.62037
  236. [236] R.S. Wenocur and R.M. Dudley, Some special Vapnik-Chervonenkis classes. Discrete Math. 33 (1981) 313–318. Zbl0459.60008
  237. [237] Y. Yang, Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45(7) (1999) 2271–2284. Zbl0962.62026
  238. [238] Y. Yang, Minimax nonparametric classification. II. Model selection for adaptation. IEEE Trans. Inform. Theory 45(7) (1999) 2285–2292. Zbl0962.62027
  239. [239] Y. Yang, Adaptive estimation in pattern recognition by combining different procedures. Statistica Sinica 10 (2000) 1069–1089. Zbl0979.62019
  240. [240] V.V. Yurinksii, Exponential bounds for large deviations. Theory Probab. Appl. 19 (1974) 154–155. Zbl0323.60029
  241. [241] V.V. Yurinksii, Exponential inequalities for sums of random vectors. J. Multivariate Anal. 6 (1976) 473–499. Zbl0346.60001
  242. [242] T. Zhang, Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 (2004) 56–85. Zbl1105.62323
  243. [243] D.-X. Zhou, Capacity of reproducing kernel spaces in learning theory. IEEE Trans. Inform. Theory 49 (2003) 1743–1752. Zbl1290.62033

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.