Data mining methods for gene selection on the basis of gene expression arrays
Michał Muszyński; Stanisław Osowski
International Journal of Applied Mathematics and Computer Science (2014)
- Volume: 24, Issue: 3, page 657-668
- ISSN: 1641-876X
Access Full Article
topAbstract
topHow to cite
topMichał Muszyński, and Stanisław Osowski. "Data mining methods for gene selection on the basis of gene expression arrays." International Journal of Applied Mathematics and Computer Science 24.3 (2014): 657-668. <http://eudml.org/doc/271873>.
@article{MichałMuszyński2014,
abstract = {The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.},
author = {Michał Muszyński, Stanisław Osowski},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {gene expression array; gene ranking; feature selection; clusterization measures; fusion; SVM classification},
language = {eng},
number = {3},
pages = {657-668},
title = {Data mining methods for gene selection on the basis of gene expression arrays},
url = {http://eudml.org/doc/271873},
volume = {24},
year = {2014},
}
TY - JOUR
AU - Michał Muszyński
AU - Stanisław Osowski
TI - Data mining methods for gene selection on the basis of gene expression arrays
JO - International Journal of Applied Mathematics and Computer Science
PY - 2014
VL - 24
IS - 3
SP - 657
EP - 668
AB - The paper presents data mining methods applied to gene selection for recognition of a particular type of prostate cancer on the basis of gene expression arrays. Several chosen methods of gene selection, including the Fisher method, correlation of gene with a class, application of the support vector machine and statistical hypotheses, are compared on the basis of clustering measures. The results of applying these individual selection methods are combined together to identify the most often selected genes forming the required pattern, best associated with the cancerous cases. This resulting pattern of selected gene lists is treated as the input data to the classifier, performing the task of the final recognition of the patterns. The numerical results of the recognition of prostate cancer from normal (reference) cases using the selected genes and the support vector machine confirm the good performance of the proposed gene selection approach.
LA - eng
KW - gene expression array; gene ranking; feature selection; clusterization measures; fusion; SVM classification
UR - http://eudml.org/doc/271873
ER -
References
top- Baldi, P. and Long, A. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inference of gene changes, Bioinformatics 17(4): 509-519.
- Chang, C.-C. and Lin, C.-J. (2011). LibSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 1(27): 1-27.
- De Rinaldis, E. (2007). DNA Microarrays: Current Applications, Horizon Scientific Press, Norfolk.
- Duda, R., Hart, P. and Stork, P. (2003). Pattern Classification and Scene Analysis, John Wiley, New York, NY.
- Eisen, M., Spellman, P. and Brown, P. (1998). Cluster analysis and display of genome wide expression patterns, Proceedings of the National Academy of Sciences 95(25): 14863-14868.
- Fan, R.-E., Chen, P.-H. and Lin, C.-J. (2005). Working set selection using second order information for training SVM, Journal of Machine Learning Research 6(12): 1889-1918. Zbl1222.68198
- Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M. and Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data, Bioinformatics 16(10): 906-914.
- Golub, T., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A. and Bloomfield, C.D. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring, Science 286(5439): 531-537.
- Guyon, I. and Elisseeff, A. (2003). An introduction to variable and feature selection, Journal of Machine Learning Research 3(3): 1158-1182. Zbl1102.68556
- Guyon, I., Weston, A., Barnhill, S. and Vapnik, V. (2002). Gene selection for cancer classification using SVM, Machine Learning 46(1-3): 389-422. Zbl0998.68111
- Haykin, S. (1999). Neural Networks. A Comprehensive Foundation, 2nd Edition, Prentice-Hall, Englewood Cliffs, NJ. Zbl0934.68076
- Herrero, J., Valencia, A. and Dopazon, A. (2001). A hierarchical unsupervised growing neural network for clustering gene expression patterns, Bioinformatics 17(2): 126-136.
- Hewett, R. and Kijsanayothin, P. (2008). Tumor classification ranking from microarray data, BMC Genomics 9(2): 1-11.
- Huang, T.M. and Kecman, V. (2005). Gene extraction for cancer diagnosis by support vector machines-an improvement, Artificial Intelligence in Medicine 9(35): 185-194.
- Huang, X. and Pan, W. (2003). Linear regression and two-class classification with gene expression data, Bioinformatics 19(16): 2072-2078.
- Makinaci, M. (2007). Support vector machine approach for classification of cancerous prostate regions, World Academy of Science, Engineering and Technology 1(7): 166-169.
- Matlab (2012). Matlab User Manual-Statistics Toolbox, MathWorks, Natic.
- Mitsubayashi, H., Aso, S., Nagashima, T. and Okada, Y. (2008). Accurate and robust gene selection for desease classification using a simple statistics, Biomedical Informatics 3(2): 68-71.
- Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E. and Golub, T. (2001). Multiclass cancer diagnosis using tumor gene expression signatures, Proceedings of the National Academy of Sciences 98(26): 15149-15154.
- Sabo, K. (2014). Center-based l₁-clustering method, International Journal of Applied Mathematics and Computer Science 24(1): 151-163, DOI: 10.2478/amcs-2014-0012. Zbl1292.62097
- Scholkopf, B. and Smola, A. (2002). Learning with Kernels, MIT Press, Cambridge, MA. Zbl1019.68094
- Sprent, P. and Smeeton, N. (2007). Applied Nonparametric Statistical Methods, Chapman and Hall-CRC, Boca Raton, FL. Zbl1141.62020
- Świniarski, R.W. (2001). Rough sets methods in feature reduction and classification, International Journal of Applied Mathematics and Computer Science 11(3): 565-582. Zbl0990.68130
- Tan, P.N., Steinbach, M. and Kumar, V. (2006). Introduction to Data Mining, Pearson Education, Boston, MA.
- Vanderbilt (2002). Data base of prostate cancer, Vanderbilt University, http://discover1.mc.vanderbilt.edu/discover/public/mcsvm.
- Vert, J. (2007). Kernel methods in genomics and computational biology, in G. Camps-Valls, J.L. Rojo-Alvarez and M. Martinez-Ramon (Eds.), Kernel Methods in Bioengineering, Signal and Image Processing, Idea Group, London, pp. 42-64.
- Wang, X. and Gotoh, O. (2009). Cancer classification using single genes, Genom Informatics 23(1): 179-188.
- Wang, X. and Gotoh, O. (2010). A robust gene selection method for microarray-based cancer classification, Cancer Informatics 9(2): 15-30.
- Wiliński, A. and Osowski, S. (2012). Ensemble of data mining methods for gene ranking, Bulletin of the Polish Academy of Sciences 60(3): 461-471.
- Woolf, P.J. and Wang, Y. (2000). A fuzzy logic approach to analyzing gene expression data, Physiological Genomics 3(1): 9-15.
- Yang, F. (2011). Robust feature selection for microarray data based on multicriterion fusion, IEEE Transactions on Computational Biology and Bioinformatics 8(4): 1080-1092.
Citations in EuDML Documents
topNotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.