On the optimality of the empirical risk minimization procedure for the convex aggregation problem

Guillaume Lecué; Shahar Mendelson

On the optimality of the empirical risk minimization procedure for the convex aggregation problem

Guillaume Lecué; Shahar Mendelson

Annales de l'I.H.P. Probabilités et statistiques (2013)

Volume: 49, Issue: 1, page 288-306
ISSN: 0246-0203

Access Full Article

top

Access to full text

Abstract

top

We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class

F

. We show that ERM performed in the convex hull of

F

is an optimal aggregation procedure for the convex aggregation problem. We also show that if this procedure is used for the problem of model selection aggregation, in which one wants to mimic the performance of the best function in

F

itself, then its rate is the same as the one achieved for the convex aggregation problem, and thus is far from optimal. These results are obtained in deviation and are sharp up to logarithmic factors.

How to cite

top

MLA
BibTeX
RIS

Lecué, Guillaume, and Mendelson, Shahar. "On the optimality of the empirical risk minimization procedure for the convex aggregation problem." Annales de l'I.H.P. Probabilités et statistiques 49.1 (2013): 288-306. <http://eudml.org/doc/272056>.

@article{Lecué2013,
abstract = {We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class $F$. We show that ERM performed in the convex hull of $F$ is an optimal aggregation procedure for the convex aggregation problem. We also show that if this procedure is used for the problem of model selection aggregation, in which one wants to mimic the performance of the best function in $F$ itself, then its rate is the same as the one achieved for the convex aggregation problem, and thus is far from optimal. These results are obtained in deviation and are sharp up to logarithmic factors.},
author = {Lecué, Guillaume, Mendelson, Shahar},
journal = {Annales de l'I.H.P. Probabilités et statistiques},
keywords = {learning theory; aggregation theory; empirical process theory},
language = {eng},
number = {1},
pages = {288-306},
publisher = {Gauthier-Villars},
title = {On the optimality of the empirical risk minimization procedure for the convex aggregation problem},
url = {http://eudml.org/doc/272056},
volume = {49},
year = {2013},
}

TY - JOUR
AU - Lecué, Guillaume
AU - Mendelson, Shahar
TI - On the optimality of the empirical risk minimization procedure for the convex aggregation problem
JO - Annales de l'I.H.P. Probabilités et statistiques
PY - 2013
PB - Gauthier-Villars
VL - 49
IS - 1
SP - 288
EP - 306
AB - We study the performance of empirical risk minimization (ERM), with respect to the quadratic risk, in the context of convex aggregation, in which one wants to construct a procedure whose risk is as close as possible to the best function in the convex hull of an arbitrary finite class $F$. We show that ERM performed in the convex hull of $F$ is an optimal aggregation procedure for the convex aggregation problem. We also show that if this procedure is used for the problem of model selection aggregation, in which one wants to mimic the performance of the best function in $F$ itself, then its rate is the same as the one achieved for the convex aggregation problem, and thus is far from optimal. These results are obtained in deviation and are sharp up to logarithmic factors.
LA - eng
KW - learning theory; aggregation theory; empirical process theory
UR - http://eudml.org/doc/272056
ER -

References

top

[1] J.-Y. Audibert. Aggregated estimators and empirical complexity for least square regression. Ann. Inst. Henri Poincaré Probab. Stat.40 (2004) 685–736. Zbl1052.62037 MR2096215
[2] J.-Y. Audibert. Progressive mixture rules are deviation suboptimal. In Advances in Neural Information Processing Systems (NIPS), 2007.
[3] J.-Y. Audibert. Fast learning rates in statistical inference through aggregation. Ann. Statist.37 (2009) 1591–1646. Zbl05582005 MR2533466
[4] P. L. Bartlett and S. Mendelson. Empirical minimization. Probab. Theory Related Fields135 (2006) 311–334. Zbl1142.62348 MR2240689
[5] P. L. Bartlett, S. Mendelson and J. Neeman. $ℓ_{1}$ -regularized linear regression: Persistence and oracle inequalities. Probab. Theory Related Fields154 (2012) 193–224. Zbl06125014 MR2981422
[6] L. Birgé. Model selection via testing: An alternative to (penalized) maximum likelihood estimators. Ann. Inst. Henri Poincaré Probab. Stat.42 (2006) 273–325. Zbl1333.62094 MR2219712
[7] O. Bousquet, V. Koltchinskii and D. Panchenko. Some local measures of complexity of convex hulls and generalization bounds. In Computational Learning Theory (Sydney, 2002) 59–73. Lecture Notes in Comput. Sci. 2375. Springer, Berlin, 2002. Zbl1050.68055 MR2040405
[8] F. Bunea and A. Nobel. Sequential procedures for aggregating arbitrary estimators of a conditional mean. IEEE Trans. Inform. Theory54 (2008) 1725–1735. Zbl1329.62359 MR2450298
[9] F. Bunea, A. B. Tsybakov and M. H. Wegkamp. Aggregation for Gaussian regression. Ann. Statist.35 (2007) 1674–1697. Zbl1209.62065 MR2351101
[10] A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory 97–111. Lecture Notes in Comput. Sci. 4539. Springer, Berlin, 2007. Zbl1203.62063 MR2397581
[11] M. Emery, A. Nemirovski and D. Voiculescu. Lectures on Probability Theory and Statistics. Lecture Notes in Math. 1738. Springer, Berlin, 2000. Zbl0941.00026 MR1775638
[12] A. Juditsky and A. Nemirovski. Functional aggregation for nonparametric regression. Ann. Statist.28 (2000) 681–712. Zbl1105.62338 MR1792783
[13] B. Klartag. $5 n$ Minkowski symmetrizations suffice to arrive at an approximate Euclidean ball. Ann. of Math. (2) 156 (2002) 947–960. Zbl1028.52002 MR1954241
[14] V. Koltchinskii. Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d’Été de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Math. Springer, Berlin, 2011. Zbl1223.91002 MR2829871
[15] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist.34 (2006) 2593–2656. Zbl1118.62065 MR2329442
[16] G. Lecué and S. Mendelson. Aggregation via empirical risk minimization. Probab. Theory Related Fields145 (2009) 591–613. Zbl1206.62094 MR2529440
[17] G. Lecué and S. Mendelson. General non-exact oracle inequalities in the unbounded case. Preprint. Zbl1274.62247
[18] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 23. Springer, Berlin, 1991. Zbl0748.60004 MR1102015
[19] S. Mendelson. Obtaining fast error rates in nonconvex situations. J. Complexity24 (2008) 380–397. Zbl05303163 MR2426759
[20] S. Mendelson. Empirical processes with a bounded $ψ_{1}$ diameter. Geom. Funct. Anal.20 (2010) 988–1027. Zbl1204.60042 MR2729283
[21] S. Mendelson, A. Pajor and N. Tomczak-Jaegermann. Reconstruction and sub-Gaussian operators in asymptotic geometric analysis. Geom. Funct. Anal.17 (2007) 1248–1282. Zbl1163.46008 MR2373017
[22] V. V. Petrov. Limit Theorems of Probability Theory. Oxford Studies in Probability 4. Oxford Univ. Press, New York, 1995. Zbl0826.60001 MR1353441
[23] G. Pisier. The Volume of Convex Bodies and Banach Space Geometry. Cambridge Tracts in Math. 94. Cambridge Univ. Press, Cambridge, 1989. Zbl0698.46008 MR1036275
[24] A. B. Tsybakov. Optimal rate of aggregation. In Computational Learning Theory and Kernel Machines (COLT-2003) 303–313. Lecture Notes in Artificial Intelligence 2777. Springer, Heidelberg, 2003. Zbl1208.62073
[25] A. B. Tsybakov. Optimal aggregation of classifiers in statistical learning. Ann. Statist.32 (2004) 135–166. Zbl1105.62353 MR2051002
[26] A. W. van der Vaart and Jon A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Ser. Statist. Springer, New York, 1996. Zbl0862.60002 MR1385671
[27] Y. Yang. Mixing strategies for density estimation. Ann. Statist.28 (2000) 75–87. Zbl1106.62322 MR1762904
[28] Y. Yang. Aggregating regression procedures to improve performance. Bernoulli10 (2004) 25–47. Zbl1040.62030 MR2044592

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.