Correlation-based feature selection strategy in classification problems

Krzysztof Michalak; Halina Kwaśnicka

International Journal of Applied Mathematics and Computer Science (2006)

  • Volume: 16, Issue: 4, page 503-511
  • ISSN: 1641-876X

Abstract

top
In classification problems, the issue of high dimensionality, of data is often considered important. To lower data dimensionality, feature selection methods are often employed. To select a set of features that will span a representation space that is as good as possible for the classification task, one must take into consideration possible interdependencies between the features. As a trade-off between the complexity of the selection process and the quality of the selected feature set, a pairwise selection strategy has been recently suggested. In this paper, a modified pairwise selection strategy is proposed. Our research suggests that computation time can be significantly lowered while maintaining the quality of the selected feature sets by using mixed univariate and bivariate feature evaluation based on the correlation between the features. This paper presents the comparison of the performance of our method with that of the unmodified pairwise selection strategy based on several well-known benchmark sets. Experimental results show that, in most cases, it is possible to lower computation time and that with high statistical significance the quality of the selected feature sets is not lower compared with those selected using the unmodified pairwise selection process.

How to cite

top

Michalak, Krzysztof, and Kwaśnicka, Halina. "Correlation-based feature selection strategy in classification problems." International Journal of Applied Mathematics and Computer Science 16.4 (2006): 503-511. <http://eudml.org/doc/207809>.

@article{Michalak2006,
abstract = {In classification problems, the issue of high dimensionality, of data is often considered important. To lower data dimensionality, feature selection methods are often employed. To select a set of features that will span a representation space that is as good as possible for the classification task, one must take into consideration possible interdependencies between the features. As a trade-off between the complexity of the selection process and the quality of the selected feature set, a pairwise selection strategy has been recently suggested. In this paper, a modified pairwise selection strategy is proposed. Our research suggests that computation time can be significantly lowered while maintaining the quality of the selected feature sets by using mixed univariate and bivariate feature evaluation based on the correlation between the features. This paper presents the comparison of the performance of our method with that of the unmodified pairwise selection strategy based on several well-known benchmark sets. Experimental results show that, in most cases, it is possible to lower computation time and that with high statistical significance the quality of the selected feature sets is not lower compared with those selected using the unmodified pairwise selection process.},
author = {Michalak, Krzysztof, Kwaśnicka, Halina},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {feature selection; pattern classification; feature correlation; pairwise feature evaluation},
language = {eng},
number = {4},
pages = {503-511},
title = {Correlation-based feature selection strategy in classification problems},
url = {http://eudml.org/doc/207809},
volume = {16},
year = {2006},
}

TY - JOUR
AU - Michalak, Krzysztof
AU - Kwaśnicka, Halina
TI - Correlation-based feature selection strategy in classification problems
JO - International Journal of Applied Mathematics and Computer Science
PY - 2006
VL - 16
IS - 4
SP - 503
EP - 511
AB - In classification problems, the issue of high dimensionality, of data is often considered important. To lower data dimensionality, feature selection methods are often employed. To select a set of features that will span a representation space that is as good as possible for the classification task, one must take into consideration possible interdependencies between the features. As a trade-off between the complexity of the selection process and the quality of the selected feature set, a pairwise selection strategy has been recently suggested. In this paper, a modified pairwise selection strategy is proposed. Our research suggests that computation time can be significantly lowered while maintaining the quality of the selected feature sets by using mixed univariate and bivariate feature evaluation based on the correlation between the features. This paper presents the comparison of the performance of our method with that of the unmodified pairwise selection strategy based on several well-known benchmark sets. Experimental results show that, in most cases, it is possible to lower computation time and that with high statistical significance the quality of the selected feature sets is not lower compared with those selected using the unmodified pairwise selection process.
LA - eng
KW - feature selection; pattern classification; feature correlation; pairwise feature evaluation
UR - http://eudml.org/doc/207809
ER -

References

top
  1. Blake C. and Merz C.(2006): UCI Repository of Machine Learning Databases. - Available at: http://www.ics.uci.edu/~mlearn/MLRepository.html. 
  2. Cover T.M. and van Campenhout J.M. (1977): On the possible ordering in the measurement selection problem. - IEEE Trans. Syst. Man Cybern., SMC-07(9), pp. 657-661. Zbl0371.62036
  3. Das S. (2001): Filters, wrappers and a boosting-based hybrid for featureselection. - Int. Conf. Machine Learning, San Francisco, Ca, USA, pp. 74-81. 
  4. Duda R., Hart P. and Stork D. (2001): Pattern Classification. - New York: Wiley. 
  5. Kittler J. (1978): Pattern Recognition and Signal Processing. - The Netherlands: Sijhoff and Noordhoff, pp. 4160. Zbl0396.62043
  6. Kohavi R. and John G.H. (1997): Wrappers for feature subset selection. - Artif. Intell., Vol. 97, Nos. 1-2, pp. 273-324. Zbl0904.68143
  7. Kwaśnicka H. and Orski P. (2004): Genetic algorithm as an attribute selection tool for learning algorithms, Intelligent Information Systems 2004, New Trends in Intelligent Information Processing and Web Mining, Proc. Int. IIS: IIP WM04 Conf. - Berlin: Springer, pp. 449-453. 
  8. Pekalska E., Harol A., Lai C. and Duin R.P.W. (2005): Pairwise selectionof features and prototypes, In: Computer Recognition Systems (Kurzyński M., Puchała E., Woźniak M.,Zolnierek, Eds.). -Proc. 4-th Int. Conf. Computer Recognition Systems, CORES'05, Advances in Soft Computing, Berlin: Springer, pp. 271-278. 
  9. Pudil P., Novovicova J. and Kittler J. (1994): Floating search methods in feature selection. - Pattern Recogn. Lett., Vol. 15, No. 11, pp. 1119-1125. 
  10. Xing E., Jordan M. and Karp R. (2001): Feature selection for high-dimensional genomic microarray data. - Proc. Int. Conf. Machine Learning,San Francisco, CA, USA, pp. 601-608. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.