Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks

Przemysław Klęsk

International Journal of Applied Mathematics and Computer Science (2010)

  • Volume: 20, Issue: 3, page 525-544
  • ISSN: 1641-876X

Abstract

top
Two known approaches to complexity selection are taken under consideration: n-fold cross-validation and structural risk minimization. Obviously, in either approach, a discrepancy between the indicated optimal complexity (indicated as the minimum of a generalization error estimate or a bound) and the genuine minimum of unknown true risks is possible. In the paper, this problem is posed in a novel quantitative way. We state and prove theorems demonstrating how one can calculate pessimistic probabilities of discrepancy between these minima for given for given conditions of an experiment. The probabilities are calculated in terms of all relevant constants: the sample size, the number of cross-validation folds, the capacity of the set of approximating functions and bounds on this set. We report experiments carried out to validate the results.

How to cite

top

Przemysław Klęsk. "Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks." International Journal of Applied Mathematics and Computer Science 20.3 (2010): 525-544. <http://eudml.org/doc/208005>.

@article{PrzemysławKlęsk2010,
abstract = {Two known approaches to complexity selection are taken under consideration: n-fold cross-validation and structural risk minimization. Obviously, in either approach, a discrepancy between the indicated optimal complexity (indicated as the minimum of a generalization error estimate or a bound) and the genuine minimum of unknown true risks is possible. In the paper, this problem is posed in a novel quantitative way. We state and prove theorems demonstrating how one can calculate pessimistic probabilities of discrepancy between these minima for given for given conditions of an experiment. The probabilities are calculated in terms of all relevant constants: the sample size, the number of cross-validation folds, the capacity of the set of approximating functions and bounds on this set. We report experiments carried out to validate the results.},
author = {Przemysław Klęsk},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {regression estimation; model comparison; complexity selection; cross-validation; generalization; statistical learning theory; generalization bounds; structural risk minimization},
language = {eng},
number = {3},
pages = {525-544},
title = {Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks},
url = {http://eudml.org/doc/208005},
volume = {20},
year = {2010},
}

TY - JOUR
AU - Przemysław Klęsk
TI - Probabilities of discrepancy between minima of cross-validation, Vapnik bounds and true risks
JO - International Journal of Applied Mathematics and Computer Science
PY - 2010
VL - 20
IS - 3
SP - 525
EP - 544
AB - Two known approaches to complexity selection are taken under consideration: n-fold cross-validation and structural risk minimization. Obviously, in either approach, a discrepancy between the indicated optimal complexity (indicated as the minimum of a generalization error estimate or a bound) and the genuine minimum of unknown true risks is possible. In the paper, this problem is posed in a novel quantitative way. We state and prove theorems demonstrating how one can calculate pessimistic probabilities of discrepancy between these minima for given for given conditions of an experiment. The probabilities are calculated in terms of all relevant constants: the sample size, the number of cross-validation folds, the capacity of the set of approximating functions and bounds on this set. We report experiments carried out to validate the results.
LA - eng
KW - regression estimation; model comparison; complexity selection; cross-validation; generalization; statistical learning theory; generalization bounds; structural risk minimization
UR - http://eudml.org/doc/208005
ER -

References

top
  1. Anthony, M. and Shawe-Taylor, J. (1993). A result of Vapnik with applications, Discrete Applied Mathematics 48(3): 207-217. Zbl0801.68147
  2. Bartlett, P. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network, IEEE Transactions on Information Theory 44(2): 525-536. Zbl0901.68177
  3. Bartlett, P., Kulkarni, S. and Posner, S. (1997). Covering numbers for real-valued function classes, IEEE Transactions on Information Theory 43(5): 1721-1724. Zbl0947.26008
  4. Bartlett, P. and Tewari, A. (2007). Sample complexity of policy search with known dynamics, Advances in Neural Information Processing Systems 19: 97-104. 
  5. Berry, A. (1941). The accuracy of the Gaussian approximation to the sum of independent variates, Transactions of the American Mathematical Society 49(1): 122-136. Zbl0025.34603
  6. Bousquet, L., Boucheron S. and Lugosi G. (2004). Introduction to Statistical Learning Theory, Advanced Lectures in Machine Learning, Springer, Heidelberg, pp. 169-207. Zbl1120.68428
  7. Cherkassky, V. and Mulier, F. (1998). Learning from Data, John Wiley & Sons, Hoboken, NJ. Zbl0960.62002
  8. DasGupta, A. (2008). Asymptotic Theory of Statistics and Probability, Springer, New York, NY. Zbl1154.62001
  9. Devroye, L., Gyorfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition, Springer-Verlag, New York, NY. Zbl0853.68150
  10. Efron, B. and Tibshirani, R. (1993). An Introduction to Bootstrap, Chapman & Hall, London. Zbl0835.62038
  11. Esséen, C. (1942). On the Liapounoff limit of error in the theory of probability, Arkiv fdr Matematik, Astronomi och Fysik 28A(9): 1-19. Zbl0027.33902
  12. Esséen, C. (1956). A moment inequality with an application to the central limit theorem, Skand. Aktuarietidskr. 39: 160-170. Zbl0078.31502
  13. Fu, W., Caroll, R. and Wang, S. (2005). Estimating misclassification error with small samples via bootstrap crossvalidation, Bioinformatics 21(9): 1979-1986. 
  14. Graham, R., Knuth, D. and Patashnik, O. (2002). Matematyka konkretna (Concrete Mathematics. A Foundation for Computer Science), PWN, Warsaw. 
  15. Hasterberg, T., Choi, N. H., Meier, L. and Fraley C. (2008). Least angle and l1 penalized regression: A review, Statistics Surveys 2: 61-93. Zbl1189.62070
  16. Hellman, M. and Raviv, J. (1970). Probability of error, equivocation and the Chernoff bound, IEEE Transactions on Information Theory 16(4): 368-372. Zbl0218.62005
  17. Hjorth, J. (1994). Computer Intensive Statistical Methods Validation, Model Selection, and Bootstrap, Chapman & Hall, London. Zbl0829.62001
  18. Knuth, D. (1997). The Art of Computer Programming, AddisonWesley, Reading, MA. Zbl0895.68055
  19. Kohavi, R. (1995). A study of cross-validation and boostrap for accuracy estimation and model selection, International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Quebec, Canada, pp. 1137-1143. 
  20. Korzeń, M. and Klęsk, P. (2008). Maximal margin estimation with perceptron-like algorithm, in L. Rutkowski, R. Sche˙ rer, R. Tadeusiewicz, L.A. Zadeh and J. Zurada (Eds.), Artificial Intelligence and Soft Computing-ICAISC 2008, Lecture Notes in Artificial Intelligence, Vol. 5097, Springer, Berlin, Heidelberg, pp. 597-608. 
  21. Krzyżak, A., Kohler M., and Schäfer D. (2000). Application of structural risk minimization to multivariate smoothing spline regression estimates, Bernoulli 8(4): 475-489. Zbl1003.62035
  22. Ng, A. (2004). Feature selection, l₁ vs. l₂ regularization, and rotational invariance, ACM International Conference on Machine Learning, Banff, Alberta, Canada, Vol. 69, pp. 78-85. 
  23. Schmidt, J., Siegel, A. and Srinivasan, A. (1995). ChernoffHoeffding bounds for applications with limited independence, SIAM Journal on Discrete Mathematics 8(2): 223-250. Zbl0819.60032
  24. Shawe-Taylor, J., Bartlett, P., Williamson, R. and Anthony, M. (1996). A framework for structural risk minimization, COLT, ACM Press, New York, NY, pp. 68-76. 
  25. Shevtsova, I. (2007). Sharpening of the upper bound of the absolute constant in the Berry-Esséen inequality, Theory of Probability and its Applications 51(3): 549-553. Zbl1125.60021
  26. van Beek, P. (1972). An application of Fourier methods to the problem of sharpening the Berry-Esséen inequality, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 23: 187-196. Zbl0238.60020
  27. Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer, New York, NY. Zbl0833.62008
  28. Vapnik, V. (1998). Statistical Learning Theory: Inference from Small Samples, Wiley, New York, NY. 
  29. Vapnik, V. (2006). Estimation of Dependencies Based on Empirical Data, Information Science & Statistics, Springer, New York, NY. Zbl1118.62002
  30. Vapnik, V. and Chervonenkis, A. (1968). On the uniform convergence of relative frequencies of events to their probabilities, Doklady Akademii Nauk 9(4): 915-918. Zbl0247.60004
  31. Vapnik, V. and Chervonenkis, A. (1989). The necessary and sufficient conditions for the consistency of the method of empirical risk minimization, Yearbook of the Academy of Sciences of the USSR on Recognition, Classification and Forecasting, Vol. 2, pp. 217-249. 
  32. Weiss, S. and Kulikowski, C. (1991). Computer Systems That Learn, Morgan Kauffman Publishers, San Francisco, CA. 
  33. Zhang, T. (2002). Covering number bounds of certain regularized linear function classes, Journal of Machine Learning Research 2: 527-550. Zbl1007.68157

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.