Selecting differentially expressed genes for colon tumor classification

Krzysztof Fujarewicz; Małgorzata Wiench

International Journal of Applied Mathematics and Computer Science (2003)

  • Volume: 13, Issue: 3, page 327-335
  • ISSN: 1641-876X

Abstract

top
DNA microarrays provide a new technique of measuring gene expression, which has attracted a lot of research interest in recent years. It was suggested that gene expression data from microarrays (biochips) can be employed in many biomedical areas, e.g., in cancer classification. Although several, new and existing, methods of classification were tested, a selection of proper (optimal) set of genes, the expressions of which can serve during classification, is still an open problem. Recently we have proposed a new recursive feature replacement (RFR) algorithm for choosing a suboptimal set of genes. The algorithm uses the support vector machines (SVM) technique. In this paper we use the RFR method for finding suboptimal gene subsets for tumornormal colon tissue classification. The obtained results are compared with the results of applying other methods recently proposed in the literature. The comparison shows that the RFR method is able to find the smallest gene subset (only six genes) that gives no misclassifications in leave-one-out cross-validation for a tumornormal colon data set. In this sense the RFR algorithm outperforms all other investigated methods.

How to cite

top

Fujarewicz, Krzysztof, and Wiench, Małgorzata. "Selecting differentially expressed genes for colon tumor classification." International Journal of Applied Mathematics and Computer Science 13.3 (2003): 327-335. <http://eudml.org/doc/207647>.

@article{Fujarewicz2003,
abstract = {DNA microarrays provide a new technique of measuring gene expression, which has attracted a lot of research interest in recent years. It was suggested that gene expression data from microarrays (biochips) can be employed in many biomedical areas, e.g., in cancer classification. Although several, new and existing, methods of classification were tested, a selection of proper (optimal) set of genes, the expressions of which can serve during classification, is still an open problem. Recently we have proposed a new recursive feature replacement (RFR) algorithm for choosing a suboptimal set of genes. The algorithm uses the support vector machines (SVM) technique. In this paper we use the RFR method for finding suboptimal gene subsets for tumornormal colon tissue classification. The obtained results are compared with the results of applying other methods recently proposed in the literature. The comparison shows that the RFR method is able to find the smallest gene subset (only six genes) that gives no misclassifications in leave-one-out cross-validation for a tumornormal colon data set. In this sense the RFR algorithm outperforms all other investigated methods.},
author = {Fujarewicz, Krzysztof, Wiench, Małgorzata},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {feature selection; support vector machines; colon tumor; gene expression data; microarrays; classification},
language = {eng},
number = {3},
pages = {327-335},
title = {Selecting differentially expressed genes for colon tumor classification},
url = {http://eudml.org/doc/207647},
volume = {13},
year = {2003},
}

TY - JOUR
AU - Fujarewicz, Krzysztof
AU - Wiench, Małgorzata
TI - Selecting differentially expressed genes for colon tumor classification
JO - International Journal of Applied Mathematics and Computer Science
PY - 2003
VL - 13
IS - 3
SP - 327
EP - 335
AB - DNA microarrays provide a new technique of measuring gene expression, which has attracted a lot of research interest in recent years. It was suggested that gene expression data from microarrays (biochips) can be employed in many biomedical areas, e.g., in cancer classification. Although several, new and existing, methods of classification were tested, a selection of proper (optimal) set of genes, the expressions of which can serve during classification, is still an open problem. Recently we have proposed a new recursive feature replacement (RFR) algorithm for choosing a suboptimal set of genes. The algorithm uses the support vector machines (SVM) technique. In this paper we use the RFR method for finding suboptimal gene subsets for tumornormal colon tissue classification. The obtained results are compared with the results of applying other methods recently proposed in the literature. The comparison shows that the RFR method is able to find the smallest gene subset (only six genes) that gives no misclassifications in leave-one-out cross-validation for a tumornormal colon data set. In this sense the RFR algorithm outperforms all other investigated methods.
LA - eng
KW - feature selection; support vector machines; colon tumor; gene expression data; microarrays; classification
UR - http://eudml.org/doc/207647
ER -

References

top
  1. Alon U., Barkai N., Notterman D.A., Gish K., Ybarra S., Mack D. and Levine A.J. (1999): Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. — Proc. Natl. Acad. Sci., Vol. 96, pp. 6745–6750. 
  2. Boser B.E., Guyon I.M. and Vapnik V. (1992): A training algorithm for optimal margin classifiers. — Proc. 5-th Ann. Workshop Computational Learning Theory, Pittsburgh, pp. 144–152. 
  3. Brown M.P.S., Groundy W.N., Lin D., Cristianini N., Sugnet C.W., Furey T.S., Ares Jr M. and Haussler D. (2000): Knowledge based analysis of microarray gene expression data by using support vector machines. — Proc. Nat. Acad. Sci., Vol. 97, No. 1, pp. 262–267. 
  4. Chilingaryan A., Gevorgyan N., Vardanyan A., Jones D. and Szabo A. (2002): A multivariate approach for selecting sets of differentially expressed genes. — Math. Biosci., Vol. 176, pp. 59–69. Zbl0996.92021
  5. Christianini N. and Shawe-Tylor J. (2000): An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. —Cambridge: Cambridge Univ. Press. 
  6. Deuser L.M. (1971): A hybrid multispectral feature selection criterion. — IEEE Trans. Comp., pp. 1116–1117. 
  7. Dua K.,Williams T.M. and Beretta L. (2001): Translational control of the proteome: relevance to cancer. — Proteomics, Vol. 1, pp. 1191–1199. 
  8. Fleishmann J., Kremmer E., Muller S., Sommer P., Kirchner T., Niedobitek G. and Grasser F.A. (1999): Expression of deoxyuridine triphosphatase (dUTPase) in colorectal tumour. —Int. J. Cancer, Vol. 84, pp. 614–617. 
  9. Fujarewicz K. and Rzeszowska-Wolny J. (2000): Cancer classification based on gene expression data. — J. Med. Inf. Technol., Vol. 5, pp. BI23–BI27. 
  10. Fujarewicz K. and Rzeszowska-Wolny J. (2001): Neural network approach to cancer classification based on gene expression levels. — Proc. IASTED Int. Conf. Modelling Identification and Control, Innsbruck, Austria, pp. 564– 568. 
  11. Fujarewicz K., Kimmel M., Rzeszowska-Wolny J. and Swierniak A. (2003): A note on classification of gene expression data using support vector machines.—J. Biol. Syst., Vol. 11, No. 1, pp. 43–56. Zbl1041.92015
  12. Furey T.S., Christianini N., Duffy N., Bednarski D.W., Schummer M. and Haussler D. (2000): Support vector machine classification and validation of cancer tissue samples using microarray expression data.—Bioinformatics, Vol. 16, No. 10, pp. 906–914. 
  13. Galbavy S., Lukac L., Porubsky Y., Cerna M., Labuda M., Kmet’ova J., Papincak J., Durdik S. and Jakubowsky J. (2002): Collagen type IV in epithelial tumours of colon. — Acta Histochem 2002, Vol. 104, pp. 331–334. 
  14. Golub T.R., Slonim T.K., Tamayo P., Huard C., Gaasenbeek M., Mesirov J.P., Coller H., Downing J.R., Caliguri M.A., Bloomfield C.D. and Lander E.S. (1999): Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. — Science, Vol. 286, pp. 531–537. 
  15. Gomi S., Nakao M., Niija F., Imamura Y., Kawano K., Nishizaka S., Hayashi A., Sobao Y., Oizumi K. and Itoh K. (1999): A cyclophilin B gene encodes antigenic epitopes recognized by HLA-A24-restricted and tumor-specific CTLs.—J. Immunol., Vol. 163, pp. 4994–5004. 
  16. Grider J.R. and Makhlouf G.M. (1992): Enteric GABAA: Mode of action and role in the regulation of the peristaltic reflex. — Am. J. Physiol., Vol. 262, pp. G690–694. 
  17. Guyon I., Weston J., Barnhill S. and Vapnik V. (2002): Gene selection for cancer classification using support vector machines. —Mach. Learn., Vol. 64, pp. 389–422. Zbl0998.68111
  18. Haykin S. (1999): Neural Networks—A Comprehensive Foundation (2nd Ed.). —Upper Saddle River, NJ: Prentice-Hall. Zbl0934.68076
  19. Hejna M., Hamilton G., Brodowicz T., Haberl I., Fiebiger W.C., Scheithauer W., Virgolin I., Kostler W.J., Oberhuber G. and Raderer M. (2001): Serum levels of vasoactive intestinal peptide (VIP) in patients with adenocarcinoma of the gastrointestinal tract. — Anticancer. Res., Vol. 21, pp. 1183–1187. 
  20. Jurianz K., Ziegler S., Garcia-Schuler H., Kraus S., Bohana- Kashtan O., Fishelson Z. and Kirschfink M. (1999): Complement resistance of tumor cells: Basal and induced mechanisms. —Mol. Immunol., Vol. 36, pp. 929–939. 
  21. Kwiatkowski D.J. (1999): Functions of gelsolin: Motility, signaling, apoptosis, cancer. — Curr. Opin. Cell. Biol., Vol. 11, pp. 103–108. 
  22. Ladner R.D., Lynch F.J., Groshen S., Xiong Y.P., Sherrod A., Caradonna S.J., Stoehlmacher J. and Lenz H.J. (2000): dUTP nucleotidohydrolase isoform expression in normal and neoplastic tissues: Association with survival and response to 5-fluorourcil in colorectal cancer. — Cancer Res., Vol. 60, pp. 3493–3503. 
  23. Li L., Weinberg C.R., Darden T.A. and Pedersen L.G. (2001): Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. — Bioinformatics, Vol. 17, pp. 1131–1142. 
  24. Lobo M.V., Martin M.E., Perez M.I., Alonso F.J., Redondo C., Alvarez M.I. and Salinas M. (2000): Levels, phosphorylation status and cellular localization of translational factor eIF2 in gastrointestinal carcinomas. — Histochem J., Vol. 32, pp. 139–150. 
  25. Nguyen D.V. and Rocke D.M. (2002): Tumor classification by partial least squares using microarray gene expression data. — Bioinformatics, Vol. 18, No. 1, pp. 39–50. 
  26. Oka Y., Naito I., Manabe K., Sado Y., Matsushima H., Ninomiya Y., Mizuno M. and Tsuji T. (2002): Distribution of collagen type IV alpha 1–6 chains in human normal colorectum and colorectal cancer demonstrated by immunofluorescence staining using chain-specific apitope-defined monoclonal antibodies. — J. Gastroenterol. Hepatol., Vol. 17, pp. 980–986. 
  27. Porter R.M., Holme T.C., Newman E.L., Hopwood D., Wilkinson J.M. and Cuschieri A. (1993): Monoclonal antibodies to cytoskeletal proteins: an immunohistochemical investigation of human colon cancer. — J. Pathol., Vol. 170, pp. 435–440. 
  28. Raderer M., Kurtaran A., Hejna M., Vorbeck F., Angelberger P., Scheithauer W. and Virgolini I. (1998): 123I-labelled vasoactive intestinal peptide receptor scintigraphy in patients with colorectal cancer. — Br. J. Cancer, Vol. 78, pp. 1–5. 
  29. Rao J. (2002): Targeting actin remodeling profiles for the detection and management of urothelial cancers—A perspective for bladder cancer research. — Front. Biosci., Vol. 7, pp. e1–8. 
  30. Schmitt C.A., Schwaeble W., Wittig, B.M., Meyer zum Buschenfelde K.H. and Dippold W.G. (1999): Expression and regulation by interferon-gamma of the membranebound complement regulators CD46 (MCP), CD55 (DAF), and CD59 in gastrointestinal tumours. — Eur. J. Cancer, Vol. 35, pp. 117–124. 
  31. Sebestyen G.S. (1962): Decision Making Processes in Pattern Recognition. — New York: Macmillan. 
  32. Sobczak W. and Malina W. (1978): Methods of Data Selection. — Warsaw: WNT, (in Polish). 
  33. Szabo A., Boucher K., Carroll W.L., Klebanov L.B., Tsodikov A.D. and Yakovlev A.Y. (2002): Variable selection and pattern recognition with gene expression data generated by the microarray technology. — Math. Biosci., Vol. 176, pp. 71–98. Zbl1006.62093
  34. Tamura M., Nishizaka S., Maeda Y., Ito M., Harashima N., Harada M., Shichijo S. and Itoh K. (2001): Identification of cyclophilin B-derived peptides capable of inducing histocompatibility leukocyte antigen-A2-restricted and tumor-specific cytotoxic T lymphocytes. — Jpn. J. Cancer Res., Vol. 92, pp. 762–767. 
  35. Thorsteinsson L., O’Dowd G.M., Harrington P.M. and Johnson P.M. (1998): The complement regulatory proteins CD46 and CD59, but not CD55, are highly expressed by glandular epithelium of human breast and colorectal tumour tissues. —APMIS, Vol. 106, pp. 869–878. 
  36. Vapnik V. (1995): The Nature of Statistical Learning Theory.— New-York: Springer-Verlag. Zbl0833.62008
  37. Winston J.S., Asch H.L., Zhang P.J., Edge S.B., Hyland A. and Asch B.B. (2001): Downregulation of gelsolin correlates with the progression to breast carcinoma. — Breast Cancer Res. Treat, Vol. 65, pp. 11–21. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.