Sparsity in penalized empirical risk minimization
Annales de l'I.H.P. Probabilités et statistiques (2009)
- Volume: 45, Issue: 1, page 7-57
- ISSN: 0246-0203
Access Full Article
topAbstract
topHow to cite
topKoltchinskii, Vladimir. "Sparsity in penalized empirical risk minimization." Annales de l'I.H.P. Probabilités et statistiques 45.1 (2009): 7-57. <http://eudml.org/doc/78023>.
@article{Koltchinskii2009,
abstract = {Let (X, Y) be a random couple in S×T with unknown distribution P. Let (X1, Y1), …, (Xn, Yn) be i.i.d. copies of (X, Y), Pn being their empirical distribution. Let h1, …, hN:S↦[−1, 1] be a dictionary consisting of N functions. For λ∈ℝN, denote fλ:=∑j=1Nλjhj. Let ℓ:T×ℝ↦ℝ be a given loss function, which is convex with respect to the second variable. Denote (ℓ•f)(x, y):=ℓ(y; f(x)). We study the following penalized empirical risk minimization problem \[\hat\{\lambda \}^\{\varepsilon \}:=\mathop \{\operatorname\{argmin\}\}\_\{\lambda \in \{\mathbb \{R\}\}^\{N\}\}\bigl [P\_\{n\}(\ell \bullet f\_\{\lambda \})+\varepsilon \Vert \lambda \Vert \_\{\ell \_\{p\}\}^\{p\}\bigr ],\]
which is an empirical version of the problem \[\lambda ^\{\varepsilon \}:=\mathop \{\operatorname\{argmin\}\}\_\{\lambda \in \{\mathbb \{R\}\}^\{N\}\}\bigl [P(\ell \bullet f\_\{\lambda \})+\varepsilon \Vert \lambda \Vert \_\{\ell \_\{p\}\}^\{p\}\bigr ]\]
(hereɛ≥0 is a regularization parameter; λ0 corresponds to ɛ=0). A number of regression and classification problems fit this general framework. We are interested in the case when p≥1, but it is close enough to 1 (so that p−1 is of the order $\frac\{1\}\{\log N\}$, or smaller). We show that the “sparsity” ofλɛ implies the “sparsity” of λ̂ɛ and study the impact of “sparsity” on bounding the excess risk P(ℓ•fλ̂ɛ)−P(ℓ•fλ0) of solutions of empirical risk minimization problems.},
author = {Koltchinskii, Vladimir},
journal = {Annales de l'I.H.P. Probabilités et statistiques},
keywords = {empirical risk; penalized empirical risk; ℓ\_p-penalty; sparsity; oracle inequalities; -penalty; Rademacher processes},
language = {eng},
number = {1},
pages = {7-57},
publisher = {Gauthier-Villars},
title = {Sparsity in penalized empirical risk minimization},
url = {http://eudml.org/doc/78023},
volume = {45},
year = {2009},
}
TY - JOUR
AU - Koltchinskii, Vladimir
TI - Sparsity in penalized empirical risk minimization
JO - Annales de l'I.H.P. Probabilités et statistiques
PY - 2009
PB - Gauthier-Villars
VL - 45
IS - 1
SP - 7
EP - 57
AB - Let (X, Y) be a random couple in S×T with unknown distribution P. Let (X1, Y1), …, (Xn, Yn) be i.i.d. copies of (X, Y), Pn being their empirical distribution. Let h1, …, hN:S↦[−1, 1] be a dictionary consisting of N functions. For λ∈ℝN, denote fλ:=∑j=1Nλjhj. Let ℓ:T×ℝ↦ℝ be a given loss function, which is convex with respect to the second variable. Denote (ℓ•f)(x, y):=ℓ(y; f(x)). We study the following penalized empirical risk minimization problem \[\hat{\lambda }^{\varepsilon }:=\mathop {\operatorname{argmin}}_{\lambda \in {\mathbb {R}}^{N}}\bigl [P_{n}(\ell \bullet f_{\lambda })+\varepsilon \Vert \lambda \Vert _{\ell _{p}}^{p}\bigr ],\]
which is an empirical version of the problem \[\lambda ^{\varepsilon }:=\mathop {\operatorname{argmin}}_{\lambda \in {\mathbb {R}}^{N}}\bigl [P(\ell \bullet f_{\lambda })+\varepsilon \Vert \lambda \Vert _{\ell _{p}}^{p}\bigr ]\]
(hereɛ≥0 is a regularization parameter; λ0 corresponds to ɛ=0). A number of regression and classification problems fit this general framework. We are interested in the case when p≥1, but it is close enough to 1 (so that p−1 is of the order $\frac{1}{\log N}$, or smaller). We show that the “sparsity” ofλɛ implies the “sparsity” of λ̂ɛ and study the impact of “sparsity” on bounding the excess risk P(ℓ•fλ̂ɛ)−P(ℓ•fλ0) of solutions of empirical risk minimization problems.
LA - eng
KW - empirical risk; penalized empirical risk; ℓ_p-penalty; sparsity; oracle inequalities; -penalty; Rademacher processes
UR - http://eudml.org/doc/78023
ER -
References
top- [1] A. Barron, L. Birgé and P. Massart. Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 (1999) 301–413. Zbl0946.62036MR1679028
- [2] P. Bartlett, O. Bousquet and S. Mendelson. Local Rademacher complexities. Ann. Statist. 33 (2005) 1497–1537. Zbl1083.62034MR2166554
- [3] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Optimization. Analysis, Algorithms and Engineering Applications. MPS/SIAM, Series on Optimization, Philadelphia, 2001. Zbl0986.90032MR1857264
- [4] F. Bunea, A. Tsybakov and M. Wegkamp. Aggregation for Gaussian regression. Ann. Statist. 35 (2007) 1674–1697. Zbl1209.62065MR2351101
- [5] F. Bunea, A. Tsybakov and M. Wegkamp. Sparsity oracle inequalities for the LASSO. Electron. J. Statist. 1 (2007) 169–194. Zbl1146.62028MR2312149
- [6] E. Candes and T. Tao. The Dantzig selector statistical estimation when p is much larger than n. Ann. Statist. 35 (2007) 2313–2351. Zbl1139.62019MR2382644
- [7] E. Candes, M. Rudelson, T. Tao and R. Vershynin. Error correction via linear programming. In Proc. 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS05) 295–308. IEEE, Pittsburgh, PA, 2005.
- [8] E. Candes, J. Romberg and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Comm. Pure Appl. Math. 59 (2006) 1207–1223. Zbl1098.94009MR2230846
- [9] O. Catoni. Statistical Learning Theory and Stochastic Optimization. Springer, New York, 2004. Zbl1076.93002MR2163920
- [10] D. L. Donoho. For most large underdetermined systems of equations the minimal ℓ1-norm near-solution approximates the sparsest near-solution. Preprint, 2004. Zbl1105.90068
- [11] D. L. Donoho. For most large underdetermined systems of linear equations the minimal ℓ1-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 (2006) 797–829. Zbl1113.15004MR2217606
- [12] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory 52 (2006) 1289–1306. Zbl1288.94016MR2241189
- [13] D. L. Donoho, M. Elad and V. Temlyakov. Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 (2006) 6–18. Zbl1288.94017MR2237332
- [14] van de S. Geer. High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 (2008) 614–645. Zbl1138.62323MR2396809
- [15] V. Koltchinskii. Model selection and aggregation in sparse classification problems. Oberwolfach Reports Meeting on Statistical and Probabilistic Methods of Model Selection, October, 2005.
- [16] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk mnimization. Ann. Statist. 34 (2006) 2593–2656. Zbl1118.62065MR2329442
- [17] V. Koltchinskii and D. Panchenko. Complexities of convex combinations and bounding the generalization error in classification. Ann. Statist. 33 (2005) 1455–1496. Zbl1080.62045MR2166553
- [18] M. Ledoux and M. Talagrand. Probability in Banach Spaces. Springer, New York, 1991. Zbl0748.60004MR1102015
- [19] P. Massart. Some applications of concentration inequalities to statistics. Ann. Fac. Sci. Tolouse (IX) 9 (2000) 245–303. Zbl0986.62002MR1813803
- [20] P. Massart. Concentration Inequalities and Model Selection. Springer, Berlin, 2007. Zbl1170.60006MR2319879
- [21] S. Mendelson, A. Pajor and N. Tomczak-Jaegermann. Reconstruction and subgaussian operators in Asymptotic Geometric Analysis. Geom. Funct. Anal. 17 (2007) 1248–1282. Zbl1163.46008MR2373017
- [22] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the LASSO. Ann. Statist. 34 (2006) 1436–1462. Zbl1113.62082MR2278363
- [23] A. Nemirovski. Topics in non-parametric statistics. In Ecole d’Et’e de Probabilités de Saint-Flour XXVIII, 199885–277. P. Bernard (Ed). Springer, New York, 2000. Zbl0998.62033MR1775640
- [24] M. Rudelson and R. Vershynin. Geometric approach to error correcting codes and reconstruction of signals. Int. Math. Res. Not. 64 (2005) 4019–4041. Zbl1103.94014MR2206919
- [25] R. Tibshirani. Regression shrinkage and selection via Lasso. J. Royal Statist. Soc. Ser. B 58 (1996) 267–288. Zbl0850.62538MR1379242
- [26] A. Tsybakov. Optimal rates of aggregation. In Proc. 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, 303–313. Lecture Notes in Artificial Intelligence 2777. Springer, New York, 2003. Zbl1208.62073
- [27] van der A. Vaart and J. Wellner. Weak Convergence and Empirical Processes. Springer, New York, 1996. Zbl0862.60002MR1385671
- [28] Y. Yang. Mixing strategies for density estimation. Ann. Statist. 28 (2000) 75–87. Zbl1106.62322MR1762904
- [29] Y. Yang. Aggregating regression procedures for a better performance. Bernoulli 10 (2004) 25–47. Zbl1040.62030MR2044592
- [30] P. Zhao and B. Yu. On model selection consistency of LASSO. J. Mach. Learn. Res. 7 (2006) 2541–2563. Zbl1222.62008MR2274449
Citations in EuDML Documents
topNotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.