Density estimation with quadratic loss: a confidence intervals method

Pierre Alquier

ESAIM: Probability and Statistics (2008)

  • Volume: 12, page 438-463
  • ISSN: 1292-8100

Abstract

top
We propose a feature selection method for density estimation with quadratic loss. This method relies on the study of unidimensional approximation models and on the definition of confidence regions for the density thanks to these models. It is quite general and includes cases of interest like detection of relevant wavelets coefficients or selection of support vectors in SVM. In the general case, we prove that every selected feature actually improves the performance of the estimator. In the case where features are defined by wavelets, we prove that this method is adaptative near minimax (up to a log term) in some Besov spaces. We end the paper by simulations indicating that it must be possible to extend the adaptation result to other features.


How to cite

top

Alquier, Pierre. "Density estimation with quadratic loss: a confidence intervals method." ESAIM: Probability and Statistics 12 (2008): 438-463. <http://eudml.org/doc/250397>.

@article{Alquier2008,
abstract = { We propose a feature selection method for density estimation with quadratic loss. This method relies on the study of unidimensional approximation models and on the definition of confidence regions for the density thanks to these models. It is quite general and includes cases of interest like detection of relevant wavelets coefficients or selection of support vectors in SVM. In the general case, we prove that every selected feature actually improves the performance of the estimator. In the case where features are defined by wavelets, we prove that this method is adaptative near minimax (up to a log term) in some Besov spaces. We end the paper by simulations indicating that it must be possible to extend the adaptation result to other features.
},
author = {Alquier, Pierre},
journal = {ESAIM: Probability and Statistics},
keywords = {Density estimation; support vector machines; kernel algorithms; thresholding methods; wavelets; density estimation; thresholding methods},
language = {eng},
month = {7},
pages = {438-463},
publisher = {EDP Sciences},
title = {Density estimation with quadratic loss: a confidence intervals method},
url = {http://eudml.org/doc/250397},
volume = {12},
year = {2008},
}

TY - JOUR
AU - Alquier, Pierre
TI - Density estimation with quadratic loss: a confidence intervals method
JO - ESAIM: Probability and Statistics
DA - 2008/7//
PB - EDP Sciences
VL - 12
SP - 438
EP - 463
AB - We propose a feature selection method for density estimation with quadratic loss. This method relies on the study of unidimensional approximation models and on the definition of confidence regions for the density thanks to these models. It is quite general and includes cases of interest like detection of relevant wavelets coefficients or selection of support vectors in SVM. In the general case, we prove that every selected feature actually improves the performance of the estimator. In the case where features are defined by wavelets, we prove that this method is adaptative near minimax (up to a log term) in some Besov spaces. We end the paper by simulations indicating that it must be possible to extend the adaptation result to other features.

LA - eng
KW - Density estimation; support vector machines; kernel algorithms; thresholding methods; wavelets; density estimation; thresholding methods
UR - http://eudml.org/doc/250397
ER -

References

top
  1. H. Akaike, A new look at the statistical model identification. IEEE Trans. Autom. Control19 (1974) 716–723.  Zbl0314.62039
  2. P. Alquier, Iterative Feature Selection In Least Square Regression Estimation. Ann. Inst. H. Poincaré B: Probab. Statist.44 (2008) 47–88.  Zbl1206.62067
  3. A. Barron, A. Cohen, W. Dahmen and R. DeVore, Adaptative Approximation and Learning by Greedy Algorithms, preprint (2006).  Zbl1138.62019
  4. G. Blanchard, P. Massart, R. Vert and L. Zwald, Kernel Projection Machine: A New Tool for Pattern Recognition. Proceedings of NIPS (2004).  
  5. B.E. Boser, I.M. Guyon and V.N. Vapnik, A training algorithm for optimal margin classifiers, in Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, D. Haussler (ed.), ACM Press (1992) 144–152.  
  6. T.T. Cai and L.D. Brown, Wavelet Estimation for Samples with Random Uniform Design. Stat. Probab. Lett.42 (1999) 313–321.  Zbl0940.62037
  7. O. Catoni, Statistical learning theory and stochastic optimization, Lecture Notes, Saint-Flour Summer School on Probability Theory (2001), Springer.  
  8. O. Catoni, PAC-Bayesian Inductive and Transductive Learning, manuscript (2006).  
  9. O. Catoni, A PAC-Bayesian approach to adaptative classification, preprint Laboratoire de Probabilités et Modèles Aléatoires (2003).  
  10. A. Cohen, Wavelet methods in numerical analysis, in Handbook of numerical analysis, Vol. VII, North-Holland, Amsterdam (2000) 417–711.  Zbl0976.65124
  11. I. Daubechies, Ten Lectures on Wavelets. SIAM, Philadelphia (1992).  Zbl0776.42018
  12. D.L. Donoho and I.M. Johnstone, Ideal Spatial Adaptation by Wavelets. Biometrika81 (1994) 425–455.  Zbl0815.62019
  13. D.L. Donoho, I.M. Johnstone, G. Kerkyacharian and D. Picard, Density Estimation by Wavelet Thresholding. Ann. Statist.24 (1996) 508–539.  Zbl0860.62032
  14. I.J. Good and R.A. Gaskins, Nonparametric roughness penalties for probability densities. Biometrika58 (1971) 255–277.  Zbl0221.62012
  15. W. Härdle, G. Kerkyacharian, D. Picard and A.B. Tsybakov, Wavelets, Approximations and Statistical Applications. Lecture Notes in Statistics, Springer (1998).  
  16. J.S. Marron and S.P. Wand, Exact Mean Integrated Square Error. Ann. Statist.20 (1992) 712–736.  Zbl0746.62040
  17. D. Panchenko, Symmetrization Approach to Concentration Inequalities for Empirical Processes. Ann. Probab.31 (2003) 2068–2081.  Zbl1042.60008
  18. R Development Core Team, R: A Language And Environment For Statistical Computing, R Foundation For Statistical Computing, Vienna, Austria, 2004. URL .  URIhttp://www.R-project.org
  19. G. Ratsch, C. Schafer, B. Scholkopf and S. Sonnenburg, Large Scale Multiple Kernel Learning. J. Machine Learning Research7 (2006) 1531–1565.  Zbl1222.90072
  20. J. Rissanen, Modeling by shortest data description. Automatica14 (1978) 465–471.  Zbl0418.93079
  21. M. Seeger, PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification. J. Machine Learning Res.3 (2002) 233–269.  Zbl1088.68745
  22. M. Tipping, The Relevance Vector Machine, in Advances in Neural Information Processing Systems, San Mateo, CA (2000). Morgan Kaufmann.  Zbl0997.68109
  23. A.B. Tsybakov, Introduction à l'estimation non-paramétrique. Mathématiques et Applications, Springer (2004).  
  24. V.N. Vapnik, The nature of statistical learning theory. Springer Verlag (1998).  Zbl0934.62009
  25. Zhao Zhang, Su Zhang, Chen-xi Zhang and Ya-zhu Chen, SVM for density estimation and application to medical image segmentation. J. Zhejiang Univ. Sci. B7 (2006).  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.