Iterative feature selection in least square regression estimation

Pierre Alquier

Iterative feature selection in least square regression estimation

Pierre Alquier

Annales de l'I.H.P. Probabilités et statistiques (2008)

Volume: 44, Issue: 1, page 47-88
ISSN: 0246-0203

Access Full Article

top

Access to full text

Full (PDF)

Abstract

top

This paper presents a new algorithm to perform regression estimation, in both the inductive and transductive setting. The estimator is defined as a linear combination of functions in a given dictionary. Coefficients of the combinations are computed sequentially using projection on some simple sets. These sets are defined as confidence regions provided by a deviation (PAC) inequality on an estimator in one-dimensional models. We prove that every projection the algorithm actually improves the performance of the estimator. We give all the estimators and results at first in the inductive case, where the algorithm requires the knowledge of the distribution of the design, and then in the transductive case, which seems a more natural application for this algorithm as we do not need particular information on the distribution of the design in this case. We finally show a connection with oracle inequalities, making us able to prove that the estimator reaches minimax rates of convergence in Sobolev and Besov spaces.

How to cite

top

MLA
BibTeX
RIS

Alquier, Pierre. "Iterative feature selection in least square regression estimation." Annales de l'I.H.P. Probabilités et statistiques 44.1 (2008): 47-88. <http://eudml.org/doc/77964>.

@article{Alquier2008,
abstract = {This paper presents a new algorithm to perform regression estimation, in both the inductive and transductive setting. The estimator is defined as a linear combination of functions in a given dictionary. Coefficients of the combinations are computed sequentially using projection on some simple sets. These sets are defined as confidence regions provided by a deviation (PAC) inequality on an estimator in one-dimensional models. We prove that every projection the algorithm actually improves the performance of the estimator. We give all the estimators and results at first in the inductive case, where the algorithm requires the knowledge of the distribution of the design, and then in the transductive case, which seems a more natural application for this algorithm as we do not need particular information on the distribution of the design in this case. We finally show a connection with oracle inequalities, making us able to prove that the estimator reaches minimax rates of convergence in Sobolev and Besov spaces.},
author = {Alquier, Pierre},
journal = {Annales de l'I.H.P. Probabilités et statistiques},
keywords = {regression estimation; statistical learning; confidence regions; thresholding methods; support vector machines},
language = {eng},
number = {1},
pages = {47-88},
publisher = {Gauthier-Villars},
title = {Iterative feature selection in least square regression estimation},
url = {http://eudml.org/doc/77964},
volume = {44},
year = {2008},
}

TY - JOUR
AU - Alquier, Pierre
TI - Iterative feature selection in least square regression estimation
JO - Annales de l'I.H.P. Probabilités et statistiques
PY - 2008
PB - Gauthier-Villars
VL - 44
IS - 1
SP - 47
EP - 88
AB - This paper presents a new algorithm to perform regression estimation, in both the inductive and transductive setting. The estimator is defined as a linear combination of functions in a given dictionary. Coefficients of the combinations are computed sequentially using projection on some simple sets. These sets are defined as confidence regions provided by a deviation (PAC) inequality on an estimator in one-dimensional models. We prove that every projection the algorithm actually improves the performance of the estimator. We give all the estimators and results at first in the inductive case, where the algorithm requires the knowledge of the distribution of the design, and then in the transductive case, which seems a more natural application for this algorithm as we do not need particular information on the distribution of the design in this case. We finally show a connection with oracle inequalities, making us able to prove that the estimator reaches minimax rates of convergence in Sobolev and Besov spaces.
LA - eng
KW - regression estimation; statistical learning; confidence regions; thresholding methods; support vector machines
UR - http://eudml.org/doc/77964
ER -

References

top

P. Alquier. Transductive and inductive adaptative inference for regression and density estimation. PhD thesis, University Paris 6, 2006.
J.-Y. Audibert. Aggregated estimators and empirical complexity for least square regression. Ann. Inst. H. Poincaré. Probab. Statist. 40 (2004) 685–736. Zbl1052.62037 MR2096215
A. Barron, A. Cohen, W. Dahmen and R. DeVore. Adaptative approximation and learning by greedy algorithms. Preprint, 2006. Zbl1138.62019 MR2387964
G. Blanchard, P. Massart, R. Vert and L. Zwald. Kernel projection machine: a new tool for pattern recognition. In Advances in Neural Inf. Proc. Systems (NIPS, 2004) 1649–1656, Mit Press, 2005.
B. E. Boser, I. M. Guyon and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pp. 144–152, ACM, 1992.
O. Catoni. A pac-Bayesian approach to adaptative classification. Preprint Laboratoire de Probabilités et Modèles Aléatoires, 2003.
O. Catoni. Statistical learning theory and stochastic optimization. Saint-Flour Summer School on Probability Theory. Lecture Notes in Math. 1851. Springer, Berlin, 2004. Zbl1076.93002 MR2163920
O. Catoni. Improved Vapnik–Cervonenkis bounds. Preprint Laboratoire de Probabilités et Modèles Aléatoires, 2005.
N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other Kernel Based Learning Methods. Cambridge University Press, 2000. Zbl0994.68074
D. L. Donoho and I. M. Johnstone. Ideal spatial adaptation by wavelets. Biometrika 81 (1994) 425–455. Zbl0815.62019 MR1311089
D. L. Donoho, I. M. Johnstone, G. Kerkyacharian and D. Picard. Density estimation by wavelet thresholding. Ann. Statist. 24 (1996) 508–539. Zbl0860.62032 MR1394974
W. Härdle, G. Kerkyacharian, D. Picard and A. B. Tsybakov. Wavelets, Approximations and Statistical Applications 129. Springer, New York, 1998. Zbl0899.62002 MR1618204
A. Juditsky, A. Nazin, A. Tsybakov and N. Vayatis. Recursive aggregation of estimators via the mirror descent algorithm with averaging. Probl. Inf. Transm. 41 (2005) 368–384. Zbl1123.62044 MR2198228
A. Juditsky, P. Rigollet and A. Tsybakov. Mirror averaging, aggregation and model selection. In Meeting on Statistical and Probabilistic Methods of Model Selection, pp. 2688–2691. Oberwolfach reports, 2005.
G. Kerkyacharian and D. Picard. Regression in random design and warped wavelets. Bernoulli 10 (2004) 1053–1105. Zbl1067.62039 MR2108043
A. Nemirovski. Topics in non-parametric statistics. Saint-Flour Summer School on Probability Theory 85–277. Lecture Notes in Math. 1738. Springer, Berlin, 2000. Zbl0998.62033 MR1775640
D. Panchenko. Symmetrization approach to concentration inequalities for empirical processes. Ann. Probab. 31 (2003) 2068–2081. Zbl1042.60008 MR2016612
R. Schapire, Y. Freund, P. Bartlett and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 (1998) 1651–1686. Zbl0929.62069 MR1673273
B. Schölkopf, A. J. Smola and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 (1998) 1299–1319.
M. Seeger. Pac-Bayesian generalization error bounds for Gaussian process classification. J. Mach. Learn. Res. 3 (2002) 233–269. Zbl1088.68745 MR1971338
A. Tsybakov. Optimal aggregation of classifiers instatistical learning. Ann. Statist. 32 (2004) 135–156. Zbl1105.62353 MR2051002
V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1998. Zbl0934.62009 MR1719582
B. Widrow and M. Hoff. Adaptative switching circuits. In IRE WESCON Convention Record, Part 4, Computers: Man–Machine Systems, 96–104, 2005.
Y. Yang. Aggregating regression procedures to improve performances. Bernoulli 10 (2004) 25–47. Zbl1040.62030 MR2044592

Citations in EuDML Documents

top

Pierre Alquier, Density estimation with quadratic loss: a confidence intervals method

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.