Model selection via testing : an alternative to (penalized) maximum likelihood estimators

Lucien Birgé

Annales de l'I.H.P. Probabilités et statistiques (2006)

  • Volume: 42, Issue: 3, page 273-325
  • ISSN: 0246-0203

How to cite

top

Birgé, Lucien. "Model selection via testing : an alternative to (penalized) maximum likelihood estimators." Annales de l'I.H.P. Probabilités et statistiques 42.3 (2006): 273-325. <http://eudml.org/doc/77897>.

@article{Birgé2006,
author = {Birgé, Lucien},
journal = {Annales de l'I.H.P. Probabilités et statistiques},
keywords = {maximum likelihood; robustness; robust tests; metric dimension; minimax risk; model selection; aggregation of estimators},
language = {eng},
number = {3},
pages = {273-325},
publisher = {Elsevier},
title = {Model selection via testing : an alternative to (penalized) maximum likelihood estimators},
url = {http://eudml.org/doc/77897},
volume = {42},
year = {2006},
}

TY - JOUR
AU - Birgé, Lucien
TI - Model selection via testing : an alternative to (penalized) maximum likelihood estimators
JO - Annales de l'I.H.P. Probabilités et statistiques
PY - 2006
PB - Elsevier
VL - 42
IS - 3
SP - 273
EP - 325
LA - eng
KW - maximum likelihood; robustness; robust tests; metric dimension; minimax risk; model selection; aggregation of estimators
UR - http://eudml.org/doc/77897
ER -

References

top
  1. [1] P. Assouad, Deux remarques sur l'estimation, C. R. Acad. Sci. Paris, Sér. I Math.296 (1983) 1021-1024. Zbl0568.62003MR777600
  2. [2] J.-Y. Audibert, Théorie statistique de l'apprentissage : une approche PAC-bayésienne, Thèse de doctorat, Laboratoire de Probabilités et Modèles Aléatoires, Université Paris VI, Paris, 2004. 
  3. [3] Y. Baraud, Model selection for regression on a random design, ESAIM Probab. Statist.6 (2002) 127-146. Zbl1059.62038MR1918295
  4. [4] A.R. Barron, Complexity regularization with applications to artificial neural networks, in: Roussas G. (Ed.), Nonparametric Functional Estimation, Kluwer, Dordrecht, 1991, pp. 561-576. Zbl0739.62001MR1154352
  5. [5] A.R. Barron, L. Birgé, P. Massart, Risk bounds for model selection via penalization, Probab. Theory Related Fields113 (1999) 301-415. Zbl0946.62036MR1679028
  6. [6] A.R. Barron, T.M. Cover, Minimum complexity density estimation, IEEE Trans. Inform. Theory37 (1991) 1034-1054. Zbl0743.62003MR1111806
  7. [7] J. Beirlant, L. Györfi, On the asymptotic normality of the L 2 -error in partitioning regression estimation, J. Statist. Plann. Inference71 (1998) 93-107. Zbl0961.62030MR1651863
  8. [8] L. Birgé, Approximation dans les espaces métriques et théorie de l'estimation, Z. Wahrscheinlichkeitstheorie Verw. Gebiete65 (1983) 181-237. Zbl0506.62026MR722129
  9. [9] L. Birgé, Sur un théorème de minimax et son application aux tests, Probab. Math. Statist.3 (1984) 259-282. Zbl0571.62036MR764150
  10. [10] L. Birgé, Stabilité et instabilité du risque minimax pour des variables indépendantes équidistribuées, Ann. Inst. H. Poincaré Sect. B20 (1984) 201-223. Zbl0542.62018MR762855
  11. [11] L. Birgé, On estimating a density using Hellinger distance and some other strange facts, Probab. Theory Related Fields71 (1986) 271-291. Zbl0561.62029MR816706
  12. [12] L. Birgé, Model selection for Gaussian regression with random design, Bernoulli10 (2004) 1039-1051. Zbl1064.62030MR2108042
  13. [13] L. Birgé, P. Massart, Rates of convergence for minimum contrast estimators, Probab. Theory Related Fields97 (1993) 113-150. Zbl0805.62037MR1240719
  14. [14] L. Birgé, P. Massart, From model selection to adaptive estimation, in: Pollard D., Torgersen E., Yang G. (Eds.), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, Springer-Verlag, New York, 1997, pp. 55-87. Zbl0920.62042MR1462939
  15. [15] L. Birgé, P. Massart, Minimum contrast estimators on sieves: exponential bounds and rates of convergence, Bernoulli4 (1998) 329-375. Zbl0954.62033MR1653272
  16. [16] L. Birgé, P. Massart, An adaptive compression algorithm in Besov spaces, Constr. Approx.16 (2000) 1-36. Zbl1004.41006MR1848840
  17. [17] L. Birgé, P. Massart, Gaussian model selection, J. Eur. Math. Soc.3 (2001) 203-268. Zbl1037.62001MR1848946
  18. [18] M.S. Birman, M.Z. Solomjak, Piecewise-polynomial approximation of functions of the classes W p , Mat. Sb.73 (1967) 295-317. Zbl0173.16001MR217487
  19. [19] L.D. Brown, M.G. Low, Asymptotic equivalence of nonparametric regression and white noise, Ann. Statist.24 (1996) 2384-2398. Zbl0867.62022MR1425958
  20. [20] F. Bunea, A.B. Tsybakov, M.H. Wegkamp, Aggregation for regression learning, Technical report 948, Laboratoire de Probabilités, Université Paris VI, 2004, http://www.proba.jussieu.fr/mathdoc/preprints/index.html#2004. Zbl1209.62065
  21. [21] G. Castellan, Modified Akaike's criterion for histogram density estimation, Technical report 99.61, Université Paris-Sud, Orsay, 1999, http://www.math.u-psud.fr/~biblio/pub/1999/. 
  22. [22] G. Castellan, Sélection d'histogrammes à l'aide d'un critère de type Akaike, C. R. Acad. Sci. Paris330 (2000) 729-732. Zbl0969.62023MR1763919
  23. [23] O. Catoni, The mixture approach to universal model selection, Technical report LMENS-97-22, Ecole Normale Supérieure, Paris, 1997, http://www.dma.ens.fr/edition/publis/1997/titre97.html. Zbl0928.62033
  24. [24] O. Catoni, Statistical learning theory and stochastic optimization, in: Picard J. (Ed.), Lecture on Probability Theory and Statistics, Ecole d'Eté de Probabilités de Saint-Flour XXXI – 2001, Lecture Note in Math., vol. 1851, Springer-Verlag, Berlin, 2004. Zbl1076.93002MR2163920
  25. [25] H. Chernoff, A measure of asymptotic efficiency of tests of a hypothesis based on a sum of observations, Ann. Math. Statist.23 (1952) 493-507. Zbl0048.11804MR57518
  26. [26] R.A. DeVore, G. Kerkyacharian, D. Picard, V. Temlyakov, Mathematical methods for supervised learning, Technical report 0422, IMI, University of South Carolina, Columbia, 2004, http://www.math.sc.edu/imip/preprints/04.html. Zbl1146.62322
  27. [27] R.A. DeVore, G.G. Lorentz, Constructive Approximation, Springer-Verlag, Berlin, 1993. Zbl0797.41016MR1261635
  28. [28] L. Devroye, G. Lugosi, Combinatorial Methods in Density Estimation, Springer-Verlag, New York, 2001. Zbl0964.62025MR1843146
  29. [29] D.L. Donoho, I.M. Johnstone, G. Kerkyacharian, D. Picard, Density estimation by wavelet thresholding, Ann. Statist.24 (1996) 508-539. Zbl0860.62032MR1394974
  30. [30] D.L. Donoho, R.C. Liu, B. MacGibbon, Minimax risk over hyperrectangles, and implications, Ann. Statist.18 (1990) 1416-1437. Zbl0705.62018MR1062717
  31. [31] P.P.B. Eggermont, V.N. LaRiccia, Maximum Penalized Likelihood Estimation, vol. I: Density Estimation, Springer, New York, 2001. Zbl0984.62026MR1837879
  32. [32] P. Groeneboom, Some current developments in density estimation, in: Bakker J.W. de, Hazewinkel M., Lenstra J.K. (Eds.), Mathematics and Computer Science, CWI Monograph, vol. 1, Elsevier, Amsterdam, 1986, pp. 163-192. Zbl0593.62030MR873578
  33. [33] L. Györfi, M. Kohler, A. Kryżak, H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer, New York, 2002. Zbl1021.62024
  34. [34] P.J. Huber, A robust version of the probability ratio test, Ann. Math. Statist.36 (1965) 1753-1758. Zbl0137.12702MR185747
  35. [35] P.J. Huber, Robust Statistics, John Wiley, New York, 1981. Zbl0536.62025MR606374
  36. [36] I.M. Johnstone, Chi-square oracle inequalities, in: Gunst M.C.M. de, Klaassen C.A.J., Vaart A.W. van der (Eds.), State of the Art in Probability and Statistics, Festschrift for Willem R. van Zwet, Lecture Notes Monograph Ser., vol. 36, Institute of Mathematical Statistics, 2001, pp. 399-418. MR1836572
  37. [37] A. Juditsky, A.S. Nemirovski, Functional aggregation for nonparametric estimation, Ann. Statist.28 (2000) 681-712. Zbl1105.62338MR1792783
  38. [38] G. Kerkyacharian, D. Picard, Thresholding algorithms, maxisets and well-concentrated bases, Test9 (2000) 283-344. Zbl1107.62323MR1821645
  39. [39] A.N. Kolmogorov, V.M. Tikhomirov, ε-entropy and ε-capacity of sets in function spaces, Amer. Math. Soc. Transl. (2)17 (1961) 277-364. Zbl0133.06703
  40. [40] B. Laurent, P. Massart, Adaptive estimation of a quadratic functional by model selection, Ann. Statist.28 (2000) 1302-1338. Zbl1105.62328MR1805785
  41. [41] L.M. Le Cam, On the assumptions used to prove asymptotic normality of maximum likelihood estimates, Ann. Math. Statist.41 (1970) 802-828. Zbl0246.62039MR267676
  42. [42] L.M. Le Cam, Limits of experiments, in: Proc. 6th Berkeley Symp. on Math. Stat. and Prob. I, 1972, pp. 245-261. Zbl0271.62004MR415819
  43. [43] L.M. Le Cam, Convergence of estimates under dimensionality restrictions, Ann. Statist.1 (1973) 38-53. Zbl0255.62006MR334381
  44. [44] L.M. Le Cam, On local and global properties in the theory of asymptotic normality of experiments, in: Puri M. (Ed.), Stochastic Processes and Related Topics, vol. 1, Academic Press, New York, 1975, pp. 13-54. Zbl0389.62011MR395005
  45. [45] L.M. Le Cam, Asymptotic Methods in Statistical Decision Theory, Springer-Verlag, New York, 1986. Zbl0605.62002MR856411
  46. [46] L.M. Le Cam, Maximum likelihood: an introduction, Inter. Statist. Rev.58 (1990) 153-171. Zbl0715.62045
  47. [47] L.M. Le Cam, Metric dimension and statistical estimation, CRM Proc. and Lecture Notes11 (1997) 303-311. Zbl0942.62035MR1479680
  48. [48] G.G. Lorentz, Approximation of Functions, Holt, Rinehart, Winston, New York, 1966. Zbl0153.38901MR213785
  49. [49] G.G. Lorentz, M. von Golitschek, Y. Makovoz, Constructive Approximation, Advanced Problems, Springer, Berlin, 1996. Zbl0910.41001MR1393437
  50. [50] A.S. Nemirovski, Topics in non-parametric statistics, in: Bernard P. (Ed.), Lecture on Probability Theory and Statistics, Ecole d'Eté de Probabilités de Saint-Flour XXVIII – 1998, Lecture Notes in Math., vol. 1738, Springer-Verlag, Berlin, 2000, pp. 85-297. Zbl0998.62033MR1775640
  51. [51] M. Nussbaum, Asymptotic equivalence of density estimation and Gaussian white noise, Ann. Statist.24 (1996) 2399-2430. Zbl0867.62035MR1425959
  52. [52] A. Pinkus, n-widths in Approximation Theory, Springer-Verlag, Berlin, 1985. Zbl0551.41001MR774404
  53. [53] M.S. Pinsker, Optimal filtration of square-integrable signals in Gaussian noise, Problems Inform. Transmission16 (1980) 120-133. Zbl0452.94003MR624591
  54. [54] X. Shen, W.H. Wong, Convergence rates of sieve estimates, Ann. Statist.22 (1994) 580-615. Zbl0805.62008MR1292531
  55. [55] B.W. Silverman, On the estimation of a probability density function by the maximum penalized likelihood method, Ann. Statist.10 (1982) 795-810. Zbl0492.62034MR663433
  56. [56] A.B. Tsybakov, Optimal rates of aggregation, in: Proceedings of 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, Lecture Notes in Artificial Intelligence, vol. 2777, Springer-Verlag, Berlin, 2003, pp. 303-313. Zbl1208.62073
  57. [57] S. van de Geer, Estimating a regression function, Ann. Statist.18 (1990) 907-924. Zbl0709.62040MR1056343
  58. [58] S. van de Geer, Hellinger-consistency of certain nonparametric maximum likelihood estimates, Ann. Statist.21 (1993) 14-44. Zbl0779.62033MR1212164
  59. [59] S. van de Geer, Empirical Processes in M-Estimation, Cambridge University Press, Cambridge, 2000. Zbl1179.62073MR1739079
  60. [60] A.W. van der Vaart, Asymptotic Statistics, Cambridge University Press, Cambridge, 1998. Zbl0910.62001MR1652247
  61. [61] G. Wahba, Spline Models for Observational Data, SIAM, Philadelphia, PA, 1990. Zbl0813.62001MR1045442
  62. [62] A. Wald, Note on the consistency of the maximum likelihood estimate, Ann. Math. Statist.20 (1949) 595-601. Zbl0034.22902MR32169
  63. [63] M.H. Wegkamp, Model selection in nonparametric regression, Ann. Statist.31 (2003) 252-273. Zbl1019.62037MR1962506
  64. [64] W.H. Wong, X. Shen, Probability inequalities for likelihood ratios and convergence rates of sieve MLEs, Ann. Statist.23 (1995) 339-362. Zbl0829.62002MR1332570
  65. [65] Y. Yang, Minimax optimal density estimation, Ph.D. dissertation, Dept. of Statistics, Yale University, New Haven, 1996. 
  66. [66] Y. Yang, Mixing strategies for density estimation, Ann. Statist.28 (2000) 75-87. Zbl1106.62322MR1762904
  67. [67] Y. Yang, Combining different procedures for adaptive regression, J. Multivariate Anal.74 (2000) 135-161. Zbl0964.62032MR1790617
  68. [68] Y. Yang, Adaptive regression by mixing, J. Amer. Statist. Assoc.96 (2001) 574-588. Zbl1018.62033MR1946426
  69. [69] Y. Yang, How accurate can any regression procedure be?, Technical report, Iowa State University, Ames, 2001, http://www.public.iastate.edu/yyang/papers/index.html. 
  70. [70] Y. Yang, Aggregating regression procedures to improve performance, Bernoulli10 (2004) 25-47. Zbl1040.62030MR2044592
  71. [71] Y. Yang, A.R. Barron, An asymptotic property of model selection criteria, IEEE Trans. Inform. Theory44 (1998) 95-116. Zbl0949.62041MR1486651
  72. [72] Y. Yang, A.R. Barron, Information-theoretic determination of minimax rates of convergence, Ann. Statist.27 (1999) 1564-1599. Zbl0978.62008MR1742500
  73. [73] Y.G. Yatracos, Rates of convergence of minimum distance estimates and Kolmogorov's entropy, Ann. Statist.13 (1985) 768-774. Zbl0576.62057MR790571
  74. [74] B. Yu, Assouad, Fano and Le Cam, in: Pollard D., Torgersen E., Yang G. (Eds.), Festschrift for Lucien Le Cam: Research Papers in Probability and Statistics, Springer-Verlag, New York, 1997, pp. 423-435. Zbl0896.62032MR1462963

Citations in EuDML Documents

top
  1. Nathalie Akakpo, Estimating a discrete distribution via histogram selection
  2. Nathalie Akakpo, Estimating a discrete distribution histogram selection
  3. Guillaume Lecué, Shahar Mendelson, On the optimality of the empirical risk minimization procedure for the convex aggregation problem
  4. Yannick Baraud, Christophe Giraud, Sylvie Huet, Estimator selection in the gaussian setting
  5. Yannick Baraud, Estimation of the density of a determinantal process
  6. Alexandre B. Tsybakov, Agrégation d’estimateurs et optimisation stochastique
  7. Yannick Baraud, Lucien Birgé, Estimating composite functions by model selection
  8. Mathieu Sart, Estimation of the transition density of a Markov chain

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.