On the Optimality of Sample-Based Estimates of the Expectation of the Empirical Minimizer***

Peter L. Bartlett; Shahar Mendelson; Petra Philips

ESAIM: Probability and Statistics (2010)

  • Volume: 14, page 315-337
  • ISSN: 1292-8100

Abstract

top
We study sample-based estimates of the expectation of the function produced by the empirical minimization algorithm. We investigate the extent to which one can estimate the rate of convergence of the empirical minimizer in a data dependent manner. We establish three main results. First, we provide an algorithm that upper bounds the expectation of the empirical minimizer in a completely data-dependent manner. This bound is based on a structural result due to Bartlett and Mendelson, which relates expectations to sample averages. Second, we show that these structural upper bounds can be loose, compared to previous bounds. In particular, we demonstrate a class for which the expectation of the empirical minimizer decreases as O(1/n) for sample size n, although the upper bound based on structural properties is Ω(1). Third, we show that this looseness of the bound is inevitable: we present an example that shows that a sharp bound cannot be universally recovered from empirical data.

How to cite

top

Bartlett, Peter L., Mendelson, Shahar, and Philips, Petra. "On the Optimality of Sample-Based Estimates of the Expectation of the Empirical Minimizer***." ESAIM: Probability and Statistics 14 (2010): 315-337. <http://eudml.org/doc/250853>.

@article{Bartlett2010,
abstract = { We study sample-based estimates of the expectation of the function produced by the empirical minimization algorithm. We investigate the extent to which one can estimate the rate of convergence of the empirical minimizer in a data dependent manner. We establish three main results. First, we provide an algorithm that upper bounds the expectation of the empirical minimizer in a completely data-dependent manner. This bound is based on a structural result due to Bartlett and Mendelson, which relates expectations to sample averages. Second, we show that these structural upper bounds can be loose, compared to previous bounds. In particular, we demonstrate a class for which the expectation of the empirical minimizer decreases as O(1/n) for sample size n, although the upper bound based on structural properties is Ω(1). Third, we show that this looseness of the bound is inevitable: we present an example that shows that a sharp bound cannot be universally recovered from empirical data. },
author = {Bartlett, Peter L., Mendelson, Shahar, Philips, Petra},
journal = {ESAIM: Probability and Statistics},
keywords = {Error bounds; empirical minimization; data-dependent complexity; error bounds; empirical minimization; data-dependent complexity},
language = {eng},
month = {10},
pages = {315-337},
publisher = {EDP Sciences},
title = {On the Optimality of Sample-Based Estimates of the Expectation of the Empirical Minimizer***},
url = {http://eudml.org/doc/250853},
volume = {14},
year = {2010},
}

TY - JOUR
AU - Bartlett, Peter L.
AU - Mendelson, Shahar
AU - Philips, Petra
TI - On the Optimality of Sample-Based Estimates of the Expectation of the Empirical Minimizer***
JO - ESAIM: Probability and Statistics
DA - 2010/10//
PB - EDP Sciences
VL - 14
SP - 315
EP - 337
AB - We study sample-based estimates of the expectation of the function produced by the empirical minimization algorithm. We investigate the extent to which one can estimate the rate of convergence of the empirical minimizer in a data dependent manner. We establish three main results. First, we provide an algorithm that upper bounds the expectation of the empirical minimizer in a completely data-dependent manner. This bound is based on a structural result due to Bartlett and Mendelson, which relates expectations to sample averages. Second, we show that these structural upper bounds can be loose, compared to previous bounds. In particular, we demonstrate a class for which the expectation of the empirical minimizer decreases as O(1/n) for sample size n, although the upper bound based on structural properties is Ω(1). Third, we show that this looseness of the bound is inevitable: we present an example that shows that a sharp bound cannot be universally recovered from empirical data.
LA - eng
KW - Error bounds; empirical minimization; data-dependent complexity; error bounds; empirical minimization; data-dependent complexity
UR - http://eudml.org/doc/250853
ER -

References

top
  1. P.L. Bartlett and S. Mendelson, Empirical minimization. Probab. Theory Relat. Fields135 (2006) 311–334.  Zbl1142.62348
  2. P.L. Bartlett and M.H. Wegkamp, Classification with a reject option using a hinge loss. J. Machine Learn. Res.9 (2008) 1823–1840.  Zbl1225.62080
  3. P.L. Bartlett, O. Bousquet and S. Mendelson, Local Rademacher Complexities. Ann. Statist.33 (2005) 1497–1537.  Zbl1083.62034
  4. P.L. Bartlett, M.I. Jordan and J.D. McAuliffe, Convexity, classification, and risk bounds. J. Am. Statist. Assoc.101 (2006) 138–156.  Zbl1118.62330
  5. G. Blanchard, G. Lugosi and N. Vayatis, On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res.4 (2003) 861–894.  Zbl1083.68109
  6. S. Boucheron, G. Lugosi and P. Massart, Concentration inequalities using the entropy method. Ann. Probab.31 (2003) 1583–1614.  Zbl1051.60020
  7. O. Bousquet, Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. Ph.D. thesis, École Polytechnique, 2002.  
  8. R.M. Dudley, Uniform Central Limit Theorems, Cambridge University Press (1999).  Zbl0951.60033
  9. D. Haussler, Sphere Packing Numbers for Subsets of the Boolean n-cube with Bounded Vapnik-Chervonenkis Dimension. J. Combin. Theory Ser. A69 (1995) 217–232.  Zbl0818.60005
  10. T. Klein, Une inégalité de concentration gauche pour les processus empiriques. C. R. Math. Acad. Sci. Paris 334 (2002) 501–504.  
  11. V. Koltchinskii, Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. Ann. Statist.34 (2006).  Zbl1118.62065
  12. V. Koltchinskii and D. Panchenko, Rademacher processes and bounding the risk of function learning. High Dimensional Probability, Vol. II (2000) 443–459.  Zbl1106.68385
  13. M. Ledoux, The Concentration of Measure Phenomenon, volume 89 of Mathematical Surveys and Monographs. American Mathematical Society (2001).  
  14. W.S. Lee, P.L. Bartlett and R.C. Williamson, The Importance of Convexity in Learning with Squared Loss. IEEE Trans. Informa. Theory44 (1998) 1974–1980.  Zbl0935.68091
  15. G. Lugosi and N. Vayatis, On the Bayes-risk consistency of regularized boosting methods (with discussion), Ann. Statist.32 (2004) 30–55.  Zbl1105.62319
  16. G. Lugosi and M. Wegkamp, Complexity regularization via localized random penalties. Ann. Statist.32 (2004) 1679–1697.  Zbl1045.62060
  17. P. Massart, The constants in Talagrand's concentration inequality for empirical processes. Ann. Probab.28 (2000) 863–884.  Zbl1140.60310
  18. P. Massart, Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Toulouse Math.IX (2000) 245–303.  Zbl0986.62002
  19. P. Massart and E. Nédélec, Risk bounds for statistical learning. Ann. Statist.34 (2006) 2326–2366.  Zbl1108.62007
  20. S. Mendelson, Improving the sample complexity using global data. IEEE Trans. Inform. Theory48 (2002) 1977–1991.  Zbl1061.68128
  21. S. Mendelson, A few notes on Statistical Learning Theory. In Proc. of the Machine Learning Summer School, Canberra 2002, S. Mendelson and A. J. Smola (Eds.), LNCS 2600. Springer (2003).  Zbl1019.68093
  22. E. Rio, Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Relat. Fields119 (2001) 163–175.  
  23. M. Rudelson and R. Vershynin, Combinatorics of random processes and sections of convex bodies. Ann. Math.164 (2006) 603–648.  Zbl1114.60009
  24. M. Talagrand, Sharper Bounds for Gaussian and Empirical Processes. Ann. Probab.22 (1994) 20–76.  Zbl0798.60051
  25. M. Talagrand, New concentration inequalities in product spaces. Inventiones Mathematicae126 (1996) 505–563.  Zbl0893.60001
  26. B. Tarigan and S.A. Van de Geer, Adaptivity of support vector machines with 1 penalty. Technical Report MI 2004-14, University of Leiden (2004).  Zbl1118.62067
  27. A. Tsybakov, Optimal aggregation of classifiers in statistical learning. Ann. Statist.32 (2004) 135–166.  Zbl1105.62353
  28. S.A. Van de Geer, A new approach to least squares estimation, with applications. Ann. Statist.15 (1987) 587–602.  Zbl0625.62046
  29. S.A. Van de Geer, Empirical Processes in M-Estimation, Cambridge University Press (2000).  Zbl1179.62073
  30. A. van der Vaart and J. Wellner, Weak Convergence and Empirical Processes. Springer (1996).  Zbl0862.60002
  31. V.N. Vapnik and A.Y. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl.16 (1971) 264–280.  Zbl0247.60005

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.