Improving feature selection process resistance to failures caused by curse-of-dimensionality effects

Petr Somol; Jiří Grim; Jana Novovičová; Pavel Pudil

Kybernetika (2011)

  • Volume: 47, Issue: 3, page 401-425
  • ISSN: 0023-5954

Abstract

top
The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given.

How to cite

top

Somol, Petr, et al. "Improving feature selection process resistance to failures caused by curse-of-dimensionality effects." Kybernetika 47.3 (2011): 401-425. <http://eudml.org/doc/196450>.

@article{Somol2011,
abstract = {The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given.},
author = {Somol, Petr, Grim, Jiří, Novovičová, Jana, Pudil, Pavel},
journal = {Kybernetika},
keywords = {feature selection; curse of dimensionality; over-fitting; stability; machine learning; dimensionality reduction; stability; machine learning; curse of dimensionality; over-fitting; dimensionality reduction},
language = {eng},
number = {3},
pages = {401-425},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Improving feature selection process resistance to failures caused by curse-of-dimensionality effects},
url = {http://eudml.org/doc/196450},
volume = {47},
year = {2011},
}

TY - JOUR
AU - Somol, Petr
AU - Grim, Jiří
AU - Novovičová, Jana
AU - Pudil, Pavel
TI - Improving feature selection process resistance to failures caused by curse-of-dimensionality effects
JO - Kybernetika
PY - 2011
PB - Institute of Information Theory and Automation AS CR
VL - 47
IS - 3
SP - 401
EP - 425
AB - The purpose of feature selection in machine learning is at least two-fold - saving measurement acquisition costs and reducing the negative effects of the curse of dimensionality with the aim to improve the accuracy of the models and the classification rate of classifiers with respect to previously unknown data. Yet it has been shown recently that the process of feature selection itself can be negatively affected by the very same curse of dimensionality - feature selection methods may easily over-fit or perform unstably. Such an outcome is unlikely to generalize well and the resulting recognition system may fail to deliver the expectable performance. In many tasks, it is therefore crucial to employ additional mechanisms of making the feature selection process more stable and resistant the curse of dimensionality effects. In this paper we discuss three different approaches to reducing this problem. We present an algorithmic extension applicable to various feature selection methods, capable of reducing excessive feature subset dependency not only on specific training data, but also on specific criterion function properties. Further, we discuss the concept of criteria ensembles, where various criteria vote about feature inclusion/removal and go on to provide a general definition of feature selection hybridization aimed at combining the advantages of dependent and independent criteria. The presented ideas are illustrated through examples and summarizing recommendations are given.
LA - eng
KW - feature selection; curse of dimensionality; over-fitting; stability; machine learning; dimensionality reduction; stability; machine learning; curse of dimensionality; over-fitting; dimensionality reduction
UR - http://eudml.org/doc/196450
ER -

References

top
  1. Brown, G., A new perspective for information theoretic feature selection, In: Proc. AISTATS ’09, JMLR: W&CP 5 (2009), pp. 49–56. (2009) 
  2. Chang, Ch.-Ch., Lin, Ch.-J., LIBSVM: A Library for SVM, 2001, http://www.csie.ntu.edu.tw/~cjlin/libsvm. 
  3. Das, S., Filters, wrappers and a boosting-based hybrid for feature selection, In: Proc. 18th Int. Conf. on Machine Learning (ICML ’01), Morgan Kaufmann Publishers Inc. 2001, pp. 74–81. (2001) 
  4. Dash, M., Choi, K., Scheuermann, P., Liu, H., Feature selection for clustering – a filter solution, In: Proc. 2002 IEEE Int. Conf. on Data Mining (ICDM ’02), Vol. 00, IEEE Comp. Soc. 2002, p. 115. (2002) 
  5. Devijver, P. A., Kittler, J., Pattern Recognition: A Statistical Approach, Prentice Hall 1982. (1982) Zbl0542.68071MR0692767
  6. Dutta, D., Guha, R., Wild, D., Chen, T., Ensemble feature selection: Consistent descriptor subsets for multiple qsar models, J. Chem. Inf. Model. 43 (2007), 3, pp. 989–997. (2007) 
  7. C. Emmanouilidis, Multiple-criteria genetic algorithms for feature selection in neuro-fuzzy modeling, In: Internat. Conf. on Neural Networks, Vol. 6, 1999, pp. 4387–4392. (1999) 
  8. Frank, A., Asuncion, A., HASH(0x3185490), UCI Machine Learning Repository, 2010. (2010) 
  9. Gheyas, I. A., Smith, L. S., 10.1016/j.patcog.2009.06.009, Pattern Recognition 43 (2010), 1, 5–13. (2010) DOI10.1016/j.patcog.2009.06.009
  10. Glover, F. W., Kochenberger, G. A., eds., Handbook of Metaheuristics, Internat. Ser. Operat. Research & Management Science 5, Springer 2003. (2003) Zbl1058.90002MR1975894
  11. Günter, S., Bunke, H., An evaluation of ensemble methods in handwritten word recog, based on feature selection. In: Proc. ICPR ’04, IEEE Comp. Soc. 2004, pp. 388–392. (2004) 
  12. Guyon, I., Elisseeff, A., An introduction to variable and feature selection, J. Mach. Learn. Res. 3 (2003), 1157–1182. (2003) Zbl1102.68556
  13. Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. A., eds., Feature Extraction – Foundations and Applications, Studies in Fuzziness and Soft Comp. 207 Physica, Springer 2006. (2006) Zbl1114.68059
  14. Ho, Tin Kam, 10.1109/34.709601, IEEE Trans. PAMI 20 (1998), 8, 832–844. (1998) DOI10.1109/34.709601
  15. Hussein, F., Ward, R., Kharma, N., Genetic algorithms for feature selection and weighting, a review and study, In: Proc. 6th ICDAR, Vol. 00, IEEE Comp. Soc. 2001, pp. 1240–1244. (2001) 
  16. Jensen, R., Performing feature selection with ACO, Studies Comput. Intelligence 34, Springer 2006, pp. 45–73. (2006) 
  17. Special issue on variable and feature selection., J. Machine Learning Research. http://www.jmlr.org/papers/special/feature.html, 2003. (2003) 
  18. Kalousis, A., Prados, J., Hilario, M., 10.1007/s10115-006-0040-8, Knowledge Inform. Systems 12 (2007), 1, 95–116. (2007) DOI10.1007/s10115-006-0040-8
  19. Kittler, J., Hatef, M., Duin, R. P. W., Matas, J., 10.1109/34.667881, IEEE Trans. PAMI 20 (1998), 3, 226–239. (1998) DOI10.1109/34.667881
  20. Kohavi, R., John, G. H., 10.1016/S0004-3702(97)00043-X, Artificial Intelligence 97 (1997), 1–2, 273–324. (1997) Zbl0904.68143DOI10.1016/S0004-3702(97)00043-X
  21. Kononenko, I., Estimating attributes: Analysis and extensions of RELIEF, In: Proc. ECML-94, Springer 1994, pp. 171–182. (1994) 
  22. Kuncheva, L. I., A stability index for feature selection, In: Proc. 25th IASTED Internat. Mul.-Conf. AIAP’07, ACTA Pr. 2007, pp. 390–395. (2007) 
  23. Lai, C., Reinders, M. J. T., Wessels, L., 10.1016/j.patrec.2005.12.018, Pattern Recognition Lett. 27 (2006), 10, 1067–1076. (2006) DOI10.1016/j.patrec.2005.12.018
  24. Liu, H., Motoda, H., Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers 1998. (1998) Zbl0908.68127
  25. Liu, H., Yu, L., Toward integrating feature selection algorithms for classification and clustering, IEEE Trans. KDE 17 (2005), 4, 491–502. (2005) 
  26. Nakariyakul, S., Casasent, D. P., 10.1016/j.patrec.2007.02.015, Pattern Recognition Lett. 28 (2007), 12, 1415–1427. (2007) DOI10.1016/j.patrec.2007.02.015
  27. Nakariyakul, S., Casasent, D. P., 10.1016/j.patcog.2008.11.018, Pattern Recognition 42 (2009), 9, 1932–1940. (2009) Zbl1178.68503DOI10.1016/j.patcog.2008.11.018
  28. Novovičová, J., Pudil, P., Kittler, J., 10.1109/34.481557, IEEE Trans. PAMI 18 (1996), 2, 218–223. (1996) DOI10.1109/34.481557
  29. Pudil, P., Novovičová, J., Choakjarernwanit, N., Kittler, J., 10.1016/0031-3203(94)00009-B, Pattern Recognition 28 (1995), 9, 1389–1398. (1995) DOI10.1016/0031-3203(94)00009-B
  30. Pudil, P., Novovičová, J., Kittler, J., 10.1016/0167-8655(94)90127-9, Pattern Recognition Lett. 15 (1994), 11, 1119–1125. (1994) DOI10.1016/0167-8655(94)90127-9
  31. Raudys, Š. J., Feature over-selection, In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 4109, Springer 2006, pp. 622–631. (2006) 
  32. al., V. C. Raykar et, Bayesian multiple instance learning: automatic feature selection and inductive transfer, In: Proc. ICML ’08, ACM 2008, pp. 808–815. (2008) 
  33. Reunanen, J., A pitfall in determining the optimal feature subset size, In: Proc. 4th Internat. Workshop on Pat. Rec. in Inf. Systs (PRIS 2004), pp. 176–185. (2004) 
  34. Reunanen, J., Less biased measurement of feature selection benefits, In: Stat. and Optimiz. Perspectives Workshop, SLSFS, Lecture Notes in Comput. Sci. 3940, Springer 2006, pp. 198–208. (2006) 
  35. Saeys, Y., Inza, I., Larrañaga, P., 10.1093/bioinformatics/btm344, Bioinformatics 23 (2007), 19, 2507–2517. (2007) DOI10.1093/bioinformatics/btm344
  36. Salappa, A., Doumpos, M., Zopounidis, C., 10.1080/10556780600881910, Optimiz. Methods Software 22 (2007), 1, 199–212. (2007) Zbl1116.62069MR2272838DOI10.1080/10556780600881910
  37. Sebastiani, F., 10.1145/505282.505283, ACM Comput. Surveys 34 (2002), 1, 1–47. (2002) DOI10.1145/505282.505283
  38. Sebban, M., Nock, R., 10.1016/S0031-3203(01)00084-X, Pattern Recognition 35 (2002), 835–846. (2002) Zbl0997.68115DOI10.1016/S0031-3203(01)00084-X
  39. Somol, P., Grim, J., Pudil, P., Criteria ensembles in feature selection, In: Proc. MCS, Lecture Notes in Comput. Sci. 5519, Springer 2009, pp. 304–313. (2009) 
  40. Somol, P., Grim, J., Pudil, P., The problem of fragile feature subset preference in feature selection methods and a proposal of algorithmic workaround, In: ICPR 2010. IEEE Comp. Soc. 2010. (2010) 
  41. Somol, P., Novovičová, J., Pudil, P., Flexible-hybrid sequential floating search in statistical feature selection, In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 4109, Springer 2006, pp. 632–639. (2006) 
  42. Somol, P., Novovičová, J., Evaluating the stability of feature selectors that optimize feature subset cardinality, In: Proc. S+SSPR, Lecture Notes in Comput. Sci. 5342 Springer 2008, pp. 956–966. (2008) 
  43. Somol, P., Novovičová, J., Grim, J., Pudil, P., Dynamic oscillating search algorithms for feature selection, In: ICPR 2008. IEEE Comp. Soc. 2008. (2008) 
  44. Somol, P., Novovičová, J., Pudil, P., Improving sequential feature selection methods performance by means of hybridization, In: Proc. 6th IASTED Int. Conf. on Advances in Computer Science and Engrg. ACTA Press 2010. (2010) 
  45. Somol, P., Pudil, P., Oscillating search algorithms for feature selection, In: ICPR 2000, IEEE Comp. Soc. 02 (2000), 406–409. (2000) 
  46. Somol, P., Pudil, P., Kittler, J., 10.1109/TPAMI.2004.28, IEEE Trans. on PAMI 26 (2004), 7, 900–912. (2004) DOI10.1109/TPAMI.2004.28
  47. Sun, Y., 10.1109/TPAMI.2007.1093, IEEE Trans. PAMI 29 (2007), 6, 1035–1051. (2007) DOI10.1109/TPAMI.2007.1093
  48. al, M.-A. Tahir et, 10.1016/j.patrec.2006.08.016, Patt. Recognition Lett. 28 (2007), 4, 438–446. (2007) DOI10.1016/j.patrec.2006.08.016
  49. Whitney, A. W., A direct method of nonparametric measurement selection, IEEE Trans. Comput. 20 (1971), 9, 1100–1103. (1971) Zbl0227.68047
  50. Yang, Y., Pedersen, J. O., A comparative study on feature selection in text categorization, In: Proc. 14th Internat. Conf. on Machine Learning (ICML ’97), Morgan Kaufmann 1997, pp. 412–420. (1997) 
  51. Yu, L., Liu, H., Feature selection for high-dimensional data: A fast correlation-based filter solution, In: Proc. 20th Internat. Conf. on Machine Learning (ICML-03), Vol. 20, Morgan Kaufmann 2003, pp. 856–863. (2003) 
  52. Zhu, Z., Ong, Y. S., Dash, M., 10.1109/TSMCB.2006.883267, IEEE Trans. Systems Man Cybernet., Part B 37 (2007), 1, 70. (2007) DOI10.1109/TSMCB.2006.883267

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.