Estimator selection in the gaussian setting

Yannick Baraud; Christophe Giraud; Sylvie Huet

Annales de l'I.H.P. Probabilités et statistiques (2014)

  • Volume: 50, Issue: 3, page 1092-1119
  • ISSN: 0246-0203

Abstract

top
We consider the problem of estimating the mean f of a Gaussian vector Y with independent components of common unknown variance σ 2 . Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection 𝔽 of estimators of f based on Y and, with the same data Y , aim at selecting an estimator among 𝔽 with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to Y may be unknown. We establish a non-asymptotic risk bound for the selected estimator and derive oracle-type inequalities when 𝔽 consists of linear estimators. As particular cases, our approach allows to handle the problems of aggregation, model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. In all theses cases but aggregation, the method can be easily implemented. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice.

How to cite

top

Baraud, Yannick, Giraud, Christophe, and Huet, Sylvie. "Estimator selection in the gaussian setting." Annales de l'I.H.P. Probabilités et statistiques 50.3 (2014): 1092-1119. <http://eudml.org/doc/271982>.

@article{Baraud2014,
abstract = {We consider the problem of estimating the mean $f$ of a Gaussian vector $Y$ with independent components of common unknown variance $\sigma ^\{2\}$. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection $\mathbb \{F\}$ of estimators of $f$ based on $Y$ and, with the same data $Y$, aim at selecting an estimator among $\mathbb \{F\}$ with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to $Y$ may be unknown. We establish a non-asymptotic risk bound for the selected estimator and derive oracle-type inequalities when $\mathbb \{F\}$ consists of linear estimators. As particular cases, our approach allows to handle the problems of aggregation, model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. In all theses cases but aggregation, the method can be easily implemented. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice.},
author = {Baraud, Yannick, Giraud, Christophe, Huet, Sylvie},
journal = {Annales de l'I.H.P. Probabilités et statistiques},
keywords = {estimator selection; model selection; variable selection; linear estimator; kernel estimator; ridge regression; Lasso; elastic net; random forest; PLS1 regression; lasso},
language = {eng},
number = {3},
pages = {1092-1119},
publisher = {Gauthier-Villars},
title = {Estimator selection in the gaussian setting},
url = {http://eudml.org/doc/271982},
volume = {50},
year = {2014},
}

TY - JOUR
AU - Baraud, Yannick
AU - Giraud, Christophe
AU - Huet, Sylvie
TI - Estimator selection in the gaussian setting
JO - Annales de l'I.H.P. Probabilités et statistiques
PY - 2014
PB - Gauthier-Villars
VL - 50
IS - 3
SP - 1092
EP - 1119
AB - We consider the problem of estimating the mean $f$ of a Gaussian vector $Y$ with independent components of common unknown variance $\sigma ^{2}$. Our estimation procedure is based on estimator selection. More precisely, we start with an arbitrary and possibly infinite collection $\mathbb {F}$ of estimators of $f$ based on $Y$ and, with the same data $Y$, aim at selecting an estimator among $\mathbb {F}$ with the smallest Euclidean risk. No assumptions on the estimators are made and their dependencies with respect to $Y$ may be unknown. We establish a non-asymptotic risk bound for the selected estimator and derive oracle-type inequalities when $\mathbb {F}$ consists of linear estimators. As particular cases, our approach allows to handle the problems of aggregation, model selection as well as those of choosing a window and a kernel for estimating a regression function, or tuning the parameter involved in a penalized criterion. In all theses cases but aggregation, the method can be easily implemented. For illustration, we carry out two simulation studies. One aims at comparing our procedure to cross-validation for choosing a tuning parameter. The other shows how to implement our approach to solve the problem of variable selection in practice.
LA - eng
KW - estimator selection; model selection; variable selection; linear estimator; kernel estimator; ridge regression; Lasso; elastic net; random forest; PLS1 regression; lasso
UR - http://eudml.org/doc/271982
ER -

References

top
  1. [1] S. Arlot. Rééchantillonnage et Sélection de modèles. Ph.D. thesis, Univ. Paris XI, 2007. 
  2. [2] S. Arlot. Model selection by resampling penalization. Electron. J. Stat.3 (2009) 557–624. Zbl1326.62097MR2519533
  3. [3] S. Arlot and F. Bach. Data-driven calibration of linear estimators with minimal penalties, 2011. Available at arXiv:0909.1884v2. 
  4. [4] S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection. Stat. Surv.4 (2010) 40–79. Zbl1190.62080MR2602303
  5. [5] Y. Baraud. Model selection for regression on a fixed design. Probab. Theory Related Fields117 (2000) 467–493. Zbl0997.62027MR1777129
  6. [6] Y. Baraud. Estimator selection with respect to Hellinger-type risks. Probab. Theory Related Fields151 (2011) 353–401. Zbl05968717MR2834722
  7. [7] Y. Baraud, C. Giraud and S. Huet. Gaussian model selection with an unknown variance. Ann. Statist.37 (2009) 630–672. Zbl1162.62051MR2502646
  8. [8] Y. Baraud, C. Giraud and S. Huet. Estimator selection in the Gaussian setting, 2010. Available at arXiv:1007.2096v1. Zbl1298.62113MR3224300
  9. [9] L. Birgé. Model selection via testing: An alternative to (penalized) maximum likelihood estimators. Ann. Inst. Henri Poincaré Probab. Stat.42 (2006) 273–325. Zbl1333.62094MR2219712
  10. [10] L. Birgé and P. Massart. Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 (2001) 203–268. Zbl1037.62001MR1848946
  11. [11] A. Boulesteix and K. Strimmer. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics8 (2006) 32–44. 
  12. [12] L. Breiman. Random forests. Mach. Learn.45 (2001) 5–32. Zbl1007.68152
  13. [13] F. Bunea, A. B. Tsybakov and M. H. Wegkamp. Aggregation for Gaussian regression. Ann. Statist.35 (2007) 1674–1697. Zbl1209.62065MR2351101
  14. [14] E. Candès and T. Tao. The Dantzig selector: Statistical estimation when p is much larger than n . Ann. Statist.35 (2007) 2313–2351. Zbl1139.62019MR2382644
  15. [15] Y. Cao and Y. Golubev. On oracle inequalities related to smoothing splines. Math. Methods Statist.15 (2006) 398–414. MR2301659
  16. [16] O. Catoni. Mixture approach to universal model selection. Technical report, Ecole Normale Supérieure, France, 1997. Zbl0928.62033
  17. [17] O. Catoni. Statistical learning theory and stochastic optimization. In Lecture Notes from the 31st Summer School on Probability Theory Held in Saint-Flour, July 8–25, 2001. Springer, Berlin, 2004. Zbl1076.93002MR2163920
  18. [18] A. Celisse. Model selection via cross-validation in density estimation, regression, and change-points detection. Ph.D. thesis, Univ. Paris XI, 2008. 
  19. [19] S. S. Chen, D. L. Donoho and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 (1998) 33–61 (electronic). Zbl0919.94002MR1639094
  20. [20] R. Díaz-Uriarte and S. Alvares de Andrés. Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7 (2006) 3. 
  21. [21] B. Efron, T. Hastie, I. Johnstone and R. Tibshirani. Least angle regression. Ann. Statist. 32 (2004) 407–499. With discussion, and a rejoinder by the authors. Zbl1091.62054MR2060166
  22. [22] R. Genuer, J.-M. Poggi and C. Tuleau-Malot. Variable selection using random forests. Patter Recognition Lett.31 (2010) 2225–2236. 
  23. [23] C. Giraud. Mixing least-squares estimators when the variance is unknown. Bernoulli14 (2008) 1089–1107. Zbl1168.62327MR2543587
  24. [24] C. Giraud, S. Huet and N. Verzelen. High-dimensional regression with unknown variance. Statist. Sci.27 (2013) 500–518. Zbl1331.62346MR3025131
  25. [25] A. Goldenshluger. A universal procedure for aggregating estimators. Ann. Statist.37 (2009) 542–568. Zbl1155.62018MR2488362
  26. [26] A. Goldenshluger and O. Lepski. Structural adaptation via 𝕃 p -norm oracle inequalities. Probab. Theory Related Fields143 (2009) 41–71. Zbl1149.62020MR2449122
  27. [27] I. Helland. Partial least squares regression. In Encyclopedia of Statistical Sciences, 2nd edition 9 5957–5962. S. Kotz, N. Balakrishnan, C. Read, B. Vidakovic and N. Johnston (Eds.). Wiley, New York, 2006. Zbl0713.62062
  28. [28] I. Helland. Some theoretical aspects of partial least squares regression. Chemometrics and Intelligent Laboratory Systems58 (2001) 97–107. 
  29. [29] A. Hoerl and R. Kennard. Ridge regression: Bayes estimation for nonorthogonal problems. Technometrics12 (1970) 55–67. Zbl0202.17205
  30. [30] A. Hoerl and R. Kennard. Ridge regression. In Encyclopedia of Statistical Sciences, 2nd edition 11 7273–7280. S. Kotz, N. Balakrishnan, C. Read, B. Vidakovic and N. Johnston (Eds.). Wiley, New York, 2006. Zbl0727.62001
  31. [31] J. Huang, S. Ma and C.-H. Zhang. Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica4 (2008) 1603–1618. Zbl1255.62198MR2469326
  32. [32] A. Juditsky and A. Nemirovski. Functional aggregation for nonparametric regression. Ann. Statist.28 (2000) 681–712. Zbl1105.62338MR1792783
  33. [33] O. V. Lepskiĭ. A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. i Primenen.35 (1990) 459–470. Zbl0725.62075MR1091202
  34. [34] O. V. Lepskiĭ. Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor. Veroyatnost. i Primenen. 36 (1991) 645–659. Zbl0738.62045MR1147167
  35. [35] O. V. Lepskiĭ. Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Adaptive estimates. Teor. Veroyatnost. i Primenen. 37 (1992) 468–481. Zbl0761.62115MR1214353
  36. [36] O. V. Lepskiĭ. On problems of adaptive estimation in white Gaussian noise. In Topics in Nonparametric Estimation 87–106. Adv. Soviet Math. 12. Amer. Math. Soc., Providence, RI, 1992. Zbl0783.62061MR1191692
  37. [37] G. Leung and A. R. Barron. Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory52 (2006) 3396–3410. Zbl1309.94051MR2242356
  38. [38] Y. Makovoz. Random approximants and neural networks. J. Approx. Theory85 (1996) 98–109. Zbl0857.41024MR1382053
  39. [39] E. A. Nadaraya. On estimating regression. Theory Probab. Appl.9 (1964) 141–142. Zbl0136.40902
  40. [40] A. Nemirovski. Topics in non-parametric statistics. In Lectures on probability theory and statistics (Saint-Flour, 1998) 85–277. Lecture Notes in Math. 1738. Springer, Berlin, 2000. Zbl0998.62033MR1775640
  41. [41] P. Rigollet and A. B. Tsybakov. Linear and convex aggregation of density estimators. Math. Methods Statist.16 (2007) 260–280. Zbl1231.62057MR2356821
  42. [42] J. Salmon and A. Dalalyan. Optimal aggregation of affine estimators. J. Mach. Learn. Res.19 (2011) 635–660. 
  43. [43] C. Strobl, A.-L. Boulesteix, T. Kneib, T. Augustin and A. Zeileis. Conditional variable importance for random forests. BMC Bioinformatics 9 (2008) 307. 
  44. [44] C. Strobl, A.-L. Boulesteix, A. Zeileis and T. Hothorn. Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics 8 (2007) 25. 
  45. [45] M. Tenenhaus. La régression PLS. Éditions Technip, Paris. Théorie et pratique, 1998. [Theory and application]. Zbl0923.62058MR1645125
  46. [46] R. Tibshirani. Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B58 (1996) 267–288. Zbl0850.62538MR1379242
  47. [47] A. B. Tsybakov. Optimal rates of aggregation. In Proceedings of the 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines 303–313. Lecture Notes in Artificial Intelligence 2777. Springer, Berlin, 2003. Zbl1208.62073
  48. [48] G. S. Watson. Smooth regression analysis. Sankhyā Ser. A26 (1964) 359–372. Zbl0137.13002MR185765
  49. [49] M. Wegkamp. Model selection in nonparametric regression. Ann. Statist.31 (2003) 252–273. Zbl1019.62037MR1962506
  50. [50] Y. Yang. Model selection for nonparametric regression. Statist. Sinica9 (1999) 475–499. Zbl0921.62051MR1707850
  51. [51] Y. Yang. Combining different procedures for adaptive regression. J. Multivariate Anal.74 (2000) 135–161. Zbl0964.62032MR1790617
  52. [52] Y. Yang. Mixing strategies for density estimation. Ann. Statist.28 (2000) 75–87. Zbl1106.62322MR1762904
  53. [53] Y. Yang. Adaptive regression by mixing. J. Amer. Statist. Assoc.96 (2001) 574–588. Zbl1018.62033MR1946426
  54. [54] T. Zhang. Learning bounds for kernel regression using effective data dimensionality. Neural Comput.17 (2005) 2077–2098. Zbl1080.68044MR2175849
  55. [55] T. Zhang. Adaptive forward-backward greedy algorithm for learning sparse representations. Technical report, Rutgers Univ., NJ, 2008. 
  56. [56] P. Zhao and B. Yu. On model selection consistency of Lasso. J. Mach. Learn. Res.7 (2006) 2541–2563. Zbl1222.62008MR2274449
  57. [57] H. Zou. The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc.101 (2006) 1418–1429. Zbl1171.62326MR2279469
  58. [58] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol.67 (2005) 301–320. Zbl1069.62054MR2137327
  59. [59] H. Zou, T. Hastie and R. TibshiraniOn the “degrees of freedom” of the Lasso. Ann. Statist.35 (2007) 2173–2192. Zbl1126.62061MR2363967

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.