Data mining methods for gene selection on the basis of gene expression arrays

Michał Muszyński; Stanisław Osowski

International Journal of Applied Mathematics and Computer Science (2014)

  • Volume: 24, Issue: 3, page 657-668
  • ISSN: 1641-876X

Abstract

top
The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.

How to cite

top

Michał Muszyński, and Stanisław Osowski. "Data mining methods for gene selection on the basis of gene expression arrays." International Journal of Applied Mathematics and Computer Science 24.3 (2014): 657-668. <http://eudml.org/doc/271873>.

@article{MichałMuszyński2014,
abstract = {The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.},
author = {Michał Muszyński, Stanisław Osowski},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {gene expression array; gene ranking; feature selection; clusterization measures; fusion; SVM classification},
language = {eng},
number = {3},
pages = {657-668},
title = {Data mining methods for gene selection on the basis of gene expression arrays},
url = {http://eudml.org/doc/271873},
volume = {24},
year = {2014},
}

TY - JOUR
AU - Michał Muszyński
AU - Stanisław Osowski
TI - Data mining methods for gene selection on the basis of gene expression arrays
JO - International Journal of Applied Mathematics and Computer Science
PY - 2014
VL - 24
IS - 3
SP - 657
EP - 668
AB - The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.
LA - eng
KW - gene expression array; gene ranking; feature selection; clusterization measures; fusion; SVM classification
UR - http://eudml.org/doc/271873
ER -

References

top
  1. Baldi, P. and Long, A. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inference of gene changes, Bioinformatics 17(4): 509-519. 
  2. Chang, C.-C. and Lin, C.-J. (2011). LibSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 1(27): 1-27. 
  3. De Rinaldis, E. (2007). DNA Microarrays: Current Applications, Horizon Scientific Press, Norfolk. 
  4. Duda, R., Hart, P. and Stork, P. (2003). Pattern Classification and Scene Analysis, John Wiley, New York, NY. 
  5. Eisen, M., Spellman, P. and Brown, P. (1998). Cluster analysis and display of genome wide expression patterns, Proceedings of the National Academy of Sciences 95(25): 14863-14868. 
  6. Fan, R.-E., Chen, P.-H. and Lin, C.-J. (2005). Working set selection using second order information for training SVM, Journal of Machine Learning Research 6(12): 1889-1918. Zbl1222.68198
  7. Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M. and Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics 16(10): 906-914. 
  8. Golub, T., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A. and Bloomfield, C.D. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science 286(5439): 531-537. 
  9. Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection, Journal of Machine Learning Research 3(3): 1158-1182. Zbl1102.68556
  10. Guyon, I., Weston, A., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using SVM, Machine Learning 46(1-3): 389-422. Zbl0998.68111
  11. Haykin, S. (1999). Neural Networks. A Comprehensive Foundation, 2nd Edition, Prentice-Hall, Englewood Cliffs, NJ. Zbl0934.68076
  12. Herrero, J., Valencia, A. and Dopazon, A. (2001). A hierarchical unsupervised growing neural network for clustering gene expression patterns, Bioinformatics 17(2): 126-136. 
  13. Hewett, R. and Kijsanayothin, P. (2008). Tumor classification ranking from microarray data, BMC Genomics 9(2): 1-11. 
  14. Huang, T.M. and Kecman, V. (2005). Gene extraction for cancer diagnosis by support vector machines-an improvement, Artificial Intelligence in Medicine 9(35): 185-194. 
  15. Huang, X. and Pan, W. (2003). Linear regression and two-class classification with gene expression data, Bioinformatics 19(16): 2072-2078. 
  16. Makinaci, M. (2007). Support vector machine approach for classification of cancerous prostate regions, World Academy of Science, Engineering and Technology 1(7): 166-169. 
  17. Matlab (2012). Matlab User Manual-Statistics Toolbox, MathWorks, Natic. 
  18. Mitsubayashi, H., Aso, S., Nagashima, T. and Okada, Y. (2008). Accurate and robust gene selection for desease classification using a simple statistics, Biomedical Informatics 3(2): 68-71. 
  19. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E. and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signatures, Proceedings of the National Academy of Sciences 98(26): 15149-15154. 
  20. Sabo, K. (2014). Center-based l₁-clustering method, International Journal of Applied Mathematics and Computer Science 24(1): 151-163, DOI: 10.2478/amcs-2014-0012. Zbl1292.62097
  21. Scholkopf, B. and Smola, A. (2002). Learning with Kernels, MIT Press, Cambridge, MA. Zbl1019.68094
  22. Sprent, P. and Smeeton, N. (2007). Applied Nonparametric Statistical Methods, Chapman and Hall-CRC, Boca Raton, FL. Zbl1141.62020
  23. Świniarski, R.W. (2001). Rough sets methods in feature reduction and classification, International Journal of Applied Mathematics and Computer Science 11(3): 565-582. Zbl0990.68130
  24. Tan, P.N., Steinbach, M. and Kumar, V. (2006). Introduction to Data Mining, Pearson Education, Boston, MA. 
  25. Vanderbilt (2002). Data base of prostate cancer, Vanderbilt University, http://discover1.mc.vanderbilt.edu/discover/public/mcsvm. 
  26. Vert, J. (2007). Kernel methods in genomics and computational biology, in G. Camps-Valls, J.L. Rojo-Alvarez and M. Martinez-Ramon (Eds.), Kernel Methods in Bioengineering, Signal and Image Processing, Idea Group, London, pp. 42-64. 
  27. Wang, X. and Gotoh, O. (2009). Cancer classification using single genes, Genom Informatics 23(1): 179-188. 
  28. Wang, X. and Gotoh, O. (2010). A robust gene selection method for microarray-based cancer classification, Cancer Informatics 9(2): 15-30. 
  29. Wiliński, A. and Osowski, S. (2012). Ensemble of data mining methods for gene ranking, Bulletin of the Polish Academy of Sciences 60(3): 461-471. 
  30. Woolf, P.J. and Wang, Y. (2000). A fuzzy logic approach to analyzing gene expression data, Physiological Genomics 3(1): 9-15. 
  31. Yang, F. (2011). Robust feature selection for microarray data based on multicriterion fusion, IEEE Transactions on Computational Biology and Bioinformatics 8(4): 1080-1092. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.