Agrégation d’estimateurs et optimisation stochastique

Alexandre B. Tsybakov

Journal de la société française de statistique (2008)

  • Volume: 149, Issue: 1, page 3-26
  • ISSN: 1962-5197

Abstract

top
This paper is a written version of the Conférence Lucien Le Cam delivered at the XXXVIIèmes Journées de Statistique in Pau, 2005. It presents an overview of some recent results on the methods of aggregation of estimators. Given a collection of M estimators, aggregation procedures consist in constructing their convex or linear combination with optimally chosen random weights. We mainly focus on the link between aggregation and stochastic optimization which leads us to the construction of some new highly efficient recursive aggregation procedures.

How to cite

top

Tsybakov, Alexandre B.. "Agrégation d’estimateurs et optimisation stochastique." Journal de la société française de statistique 149.1 (2008): 3-26. <http://eudml.org/doc/93474>.

@article{Tsybakov2008,
abstract = {Cet article fait suite à la Conférence Lucien Le Cam que j’ai eu l’honneur de donner lors des XXXVIIèmes Journées de Statistique à Pau, en 2005. Il présente un aperçu de quelques résultats récents sur les méthodes d’agrégation d’estimateurs. Ces méthodes consistent à construire, à partir d’un ensemble de $M$ estimateurs donnés, une combinaison linéaire ou convexe de ces estimateurs avec des poids aléatoires choisis de façon optimale. Nous mettons l’accent sur le lien entre agrégation et optimisation stochastique, ce qui nous permet d’aboutir à de nouvelles procédures recursives d’agrégation très performantes.},
author = {Tsybakov, Alexandre B.},
journal = {Journal de la société française de statistique},
keywords = {aggregation; stochastic optimization; mirror descent; adaptive estimation; optimal rate of aggregation},
language = {fre},
number = {1},
pages = {3-26},
publisher = {Société française de statistique},
title = {Agrégation d’estimateurs et optimisation stochastique},
url = {http://eudml.org/doc/93474},
volume = {149},
year = {2008},
}

TY - JOUR
AU - Tsybakov, Alexandre B.
TI - Agrégation d’estimateurs et optimisation stochastique
JO - Journal de la société française de statistique
PY - 2008
PB - Société française de statistique
VL - 149
IS - 1
SP - 3
EP - 26
AB - Cet article fait suite à la Conférence Lucien Le Cam que j’ai eu l’honneur de donner lors des XXXVIIèmes Journées de Statistique à Pau, en 2005. Il présente un aperçu de quelques résultats récents sur les méthodes d’agrégation d’estimateurs. Ces méthodes consistent à construire, à partir d’un ensemble de $M$ estimateurs donnés, une combinaison linéaire ou convexe de ces estimateurs avec des poids aléatoires choisis de façon optimale. Nous mettons l’accent sur le lien entre agrégation et optimisation stochastique, ce qui nous permet d’aboutir à de nouvelles procédures recursives d’agrégation très performantes.
LA - fre
KW - aggregation; stochastic optimization; mirror descent; adaptive estimation; optimal rate of aggregation
UR - http://eudml.org/doc/93474
ER -

References

top
  1. [1] Audibert J.-Y. (2004). Aggregated estimators and empirical complexity for least square regression. Annales de l’Institut Henri Poincaré (B) Probabilités et Statistiques 40 : 685-736. Zbl1052.62037MR2096215
  2. [2] Barron A., Cohen A., Dahmen W. et DeVore R. (2005). Approximation and learning by greedy algorithms. Annals of Statistics, à paraître. Zbl1138.62019
  3. [3] Ben-Tal A. et Nemirovski A.S. (1999). The conjugate barrier mirror descent method for non-smooth convex optimization. MINERVA Optim. Center Report., Haifa : Faculty of Industrial Engineering and Management, Technion – Israel Institute of Technology. http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP_MD.pdf 
  4. [4] Birgé L. (2006). Model selection via testing : an alternative to (penalized) maximum likelihood estimators. Annales de l’Institut Henri Poincaré (B) Probabilités et Statistiques 42 : 273 – 325. Zbl1333.62094MR2219712
  5. [5] Bickel P.J., Ritov Y. et Zakai A. (2006). Some theory for generalized boosting algorithms. J. Machine Learning Research 7 : 705–732. Zbl1222.68148MR2274384
  6. [6] Bühlman P. et Yu B. (2005). Sparse boosting. J. Machine Learning Research 7 : 1001–1024. MR2274395
  7. [7] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2004). Aggregation for regression learning. Preprint LPMA, Universities Paris 6 – Paris 7, n. 948, arXiv :math.ST/0410214 et https://hal.ccsd.cnrs.fr/ccsd-00003205 Zbl1209.62065
  8. [8] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2006). Aggregation and sparsity via 1 penalized least squares. Proceedings of 19th Annual Conference on Learning Theory (COLT 2006), Lecture Notes in Artificial Intelligence v. 4005 (Lugosi, G. and Simon, H.U., eds.), Springer-Verlag, Berlin-Heidelberg, 379–391. Zbl1143.62319MR2280619
  9. [9] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2007a). Aggregation for Gaussian regression. Annals of Statistics 35 : 1674-1697. Zbl1209.62065MR2351101
  10. [10] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2007b). Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics 1 : 169-194. Zbl1146.62028MR2312149
  11. [11] Bunea F., Tsybakov A.B. et Wegkamp M.H. (2007c). Sparse density estimation with 1 penalties. Proceedings of 20th Annual Conference on Learning Theory (COLT 2007), Lecture Notes in Artificial Intelligence, v. 4539 (N.H. Bshouty and C. Gentile, eds.), Springer-Verlag, Berlin-Heidelberg, 530-543. Zbl1203.62053MR2397610
  12. [12] Catoni O. (1999). “Universal” aggregation rules with exact bias bounds. Preprint LPMA, Universités Paris 6 – Paris 7, n.510. 
  13. [13] Catoni O. (2004). Statistical Learning Theory and Stochastic Optimization. Ecole d’Eté de Probabilités de Saint-Flour XXXI - 2001. Lecture Notes in Mathematics, vol. 1851, Springer, New York. Zbl1076.93002MR2163920
  14. [14] Cesa-Bianchi N. et Lugosi G. (2006). Prediction, Learning, and Games. Cambridge Univ. Press. Zbl1114.91001MR2409394
  15. [15] Dalalyan A., Juditsky A. et Spokoiny V. (2007). A new algorithm for estimating the effective dimension-reduction subspace. arXiv :math/0701887v1. Zbl1225.62091
  16. [16] Dalalyan A. et Tsybakov A.B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. Proceedings of the 20th Annual Conference on Learning Theory (COLT-2007), Lecture Notes in Artificial Intelligence, v. 4539 (N.H. Bshouty and C. Gentile, eds.), Springer-Verlag, Berlin-Heidelberg, 97-111. Zbl1203.62063MR2397581
  17. [17] Freund Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation 121 : 256–285. Zbl0833.68109MR1348530
  18. [18] Hristache M., Juditsky A. et Spokoiny V. (2001). Direct estimation of the index coefficient in a single-index model. Annals of Statistics 29 : 593–623. Zbl1012.62043MR1865333
  19. [19] Huber P.J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 1 : 221–234. Zbl0212.21504MR216620
  20. [20] Juditsky A.B., Nazin A.V., Tsybakov A.B. et Vayatis N. (2005). Recursive aggregation of estimators by the mirror descent algorithm with averaging. Problems of Information Transmission 41 : 368–384. Zbl1123.62044MR2198228
  21. [21] Juditsky A. et Nemirovski A. (2000). Functional aggregation for nonparametric estimation. Annals of Statistics 28 : 681–712. Zbl1105.62338MR1792783
  22. [22] Juditsky A., Rigollet Ph. et Tsybakov A.B. (2005). Learning by mirror averaging. Annals of Statistics, à paraître. Zbl1274.62288
  23. [23] Klemelä J. (2006). Density estimation with stagewise optimization of the empirical risk. Machine Learning 67 : 169–195. 
  24. [24] Koltchinskii V. (2006a). Local Rademacher complexities and oracle inequalities in empirical risk minimization (with discusssion). Annals of Statistics 34 : 2697–2706. Zbl1118.62065MR2329464
  25. [25] Koltchinskii V. (2006b). Sparsity in penalized empirical risk minimization. Annales de l’Institut Henri Poincaré (B) Probabilités et Statistiques, à paraître. Zbl1168.62044
  26. [26] Le Cam L. (1953). On some asymptotic properties of maximum likelihood estimates and related Bayes estimates. University of California Publications in Statistics 1 : 277–329. Zbl0052.15404MR54913
  27. [27] Lecué G. (2005). Lower bounds and aggregation in density estimation. J. Machine Learning Research 7 : 971–981. Zbl1222.62052MR2274393
  28. [28] Lecué G. (2007a). Méthodes d’agrégation : optimalité et vitesses rapides. Thèse de doctorat, Université Paris 6. http://tel.archives-ouvertes.fr/tel-00150402 
  29. [29] Lecué G. (2007b). Suboptimality of penalized empirical risk minimization in classification. Proceedings of 20th Annual Conference on Learning Theory (COLT 2007), Lecture Notes in Artificial Intelligence, v. 4539 (N.H. Bshouty and C.Gentile, eds.), Springer-Verlag, Berlin-Heidelberg, 142–156. Zbl1203.68159MR2397584
  30. [30] Lugosi G. (2006). Prédiction randomisée de suites individuelles. J. Société Française de Statistique 147 : 5–37. 
  31. [31] Lounici K. (2007). Generalized mirror averaging and D -convex aggregation. Mathematical Methods of Statistics 16 : 246–259. Zbl1231.62046MR2356820
  32. [32] Lugosi G. et Vayatis N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Annals of Statistics 32 : 30–55. Zbl1105.62319MR2051000
  33. [33] Mannor S., Meir R. et Zhang T. (2003). Greedy algorithms for classification – consistency, convergence rates, and adaptivity. Journal of Machine Learning Research 4 : 713–742. Zbl1105.68388MR2072266
  34. [34] Nemirovski A. (2000). Topics in Non-parametric Statistics. Ecole d’Eté de Probabilités de Saint-Flour XXVIII - 1998, Lecture Notes in Mathematics, v. 1738, Springer : New York. Zbl0998.62033MR1775640
  35. [35] Nemirovski A.S. et Yudin D.B. (1983). Problem Complexity and Method Efficiency in Optimization, Wiley, Chichester. Zbl0501.90062MR702836
  36. [36] Nesterov Yu. (2005). Primal-dual subgradient methods for convex problems. CORE discussion paper 2005/67. Center for Operations Research and Econometrics, Louvain-la-Neuve, Belgique. 
  37. [37] Nesterov Yu. (2007). Primal-dual subgradient methods for convex problems. Mathematical Programming, publié en ligne, DOI : 10.1007/s10107-007-0149-x. 
  38. [38] Rigollet Ph. (2006). Inégalités d’oracle, agrégation et adaptation. Thèse de doctorat, Université Paris 6. http://tel.archives-ouvertes.fr/tel-00115494 
  39. [39] Rigollet Ph. et Tsybakov A. (2007). Linear and convex aggregation of density estimators. Mathematical Methods of Statistics 16 : 260–280. Zbl1231.62057MR2356821
  40. [40] Robbins H. et Monro S. (1951). A stochastic approximation method. Annals of Mathematical Statistics 22 : 400–407. Zbl0054.05901MR42668
  41. [41] Rockafellar R.T. et Wets R.J.B. (1998). Variational Analysis. Springer, N.Y. Zbl0888.49001MR1491362
  42. [42] Samarov A. et Tsybakov A. (2007) Aggregation of density estimators and dimension reduction. Advances in Statistical Modeling and Inference. Essays in Honor of Kjell A. Doksum (V. Nair, ed.), World Scientific, Singapore e.a., 233–251. MR2416118
  43. [43] Schapire R.E. (1990). The strength of weak learnability. Machine Learning 5 : 197–227. Zbl0747.68058
  44. [44] Tsybakov A. (2003). Optimal rates of aggregation. Computational Learning Theory and Kernel Machines, (B. Schölkopf and M. Warmuth, eds.), Lecture Notes in Artificial Intelligence, v. 2777. Springer, Heidelberg, 303-313. 
  45. [45] Vajda I. (1986). Theory of Statistical Inference and Information. Kluwer, Dordrecht. Zbl0711.62002
  46. [46] Van de Geer S.A. (2006). High dimensional generalized linear models and the Lasso. Annals of Statistics, à paraître. Zbl1138.62323
  47. [47] Vapnik V. et Chervonenkis A. (1974). Theory of Pattern Recognition Nauka, Moscow (in Russian). Traduction allemande : Wapnik W. und Tscherwonenkis A. Theorie der Zeichenerkennung, Berlin : Akademie-Verlag, 1979. MR474638
  48. [48] Wand M.P. et Jones M.C. (1995). Kernel Smoothing. Chapman and Hall, London. Zbl0854.62043MR1319818
  49. [49] Wegkamp M.H. (2003). Model selection in nonparametric regression. Annals of Statistics 31 : 252–273. Zbl1019.62037MR1962506
  50. [50] Yang Y. (2000a). Mixing strategies for density estimation. Annals of Statistics 28 : 75–87. Zbl1106.62322MR1762904
  51. [51] Yang Y. (2000b). Combining different procedures for adaptive regression. Journal of Multivariate Analysis 74 : 135–161. Zbl0964.62032MR1790617
  52. [52] Yang Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association 96 : 574–588. Zbl1018.62033MR1946426
  53. [53] Yang Y. (2004). Aggregating regression procedures for a better performance. Bernoulli 10 : 25–47. Zbl1040.62030MR2044592
  54. [54] Zhang T. et Yu B. (2005) Boosting with early stopping : convergence and cosistency. Annals of Statistics 33 : 1538–1579. Zbl1078.62038MR2166555

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.