Characterization of lung tumor subtypes through gene expression cluster validity assessment

Giorgio Valentini; Francesca Ruffino

RAIRO - Theoretical Informatics and Applications (2006)

  • Volume: 40, Issue: 2, page 163-176
  • ISSN: 0988-3754

Abstract

top
The problem of assessing the reliability of clusters patients identified by clustering algorithms is crucial to estimate the significance of subclasses of diseases detectable at bio-molecular level, and more in general to support bio-medical discovery of patterns in gene expression data. In this paper we present an experimental analysis of the reliability of clusters discovered in lung tumor patients using DNA microarray data. In particular we investigate if subclasses of lung adenocarcinoma can be detected with high reliability at bio-molecular level. To this end we apply cluster validity measures based on random projections recently proposed by Bertoni and coworkers. The results show that at least two subclasses of lung adenocarcinoma can be detected with relatively high reliability, confirming and extending previous findings reported in the literature.

How to cite

top

Valentini, Giorgio, and Ruffino, Francesca. "Characterization of lung tumor subtypes through gene expression cluster validity assessment." RAIRO - Theoretical Informatics and Applications 40.2 (2006): 163-176. <http://eudml.org/doc/249627>.

@article{Valentini2006,
abstract = { The problem of assessing the reliability of clusters patients identified by clustering algorithms is crucial to estimate the significance of subclasses of diseases detectable at bio-molecular level, and more in general to support bio-medical discovery of patterns in gene expression data. In this paper we present an experimental analysis of the reliability of clusters discovered in lung tumor patients using DNA microarray data. In particular we investigate if subclasses of lung adenocarcinoma can be detected with high reliability at bio-molecular level. To this end we apply cluster validity measures based on random projections recently proposed by Bertoni and coworkers. The results show that at least two subclasses of lung adenocarcinoma can be detected with relatively high reliability, confirming and extending previous findings reported in the literature. },
author = {Valentini, Giorgio, Ruffino, Francesca},
journal = {RAIRO - Theoretical Informatics and Applications},
keywords = {Cluster validity; clustering algorithms; bio-molecular taxonomy of tumors; DNA microarray data analysis.; DNA microarray data analysis},
language = {eng},
month = {7},
number = {2},
pages = {163-176},
publisher = {EDP Sciences},
title = {Characterization of lung tumor subtypes through gene expression cluster validity assessment},
url = {http://eudml.org/doc/249627},
volume = {40},
year = {2006},
}

TY - JOUR
AU - Valentini, Giorgio
AU - Ruffino, Francesca
TI - Characterization of lung tumor subtypes through gene expression cluster validity assessment
JO - RAIRO - Theoretical Informatics and Applications
DA - 2006/7//
PB - EDP Sciences
VL - 40
IS - 2
SP - 163
EP - 176
AB - The problem of assessing the reliability of clusters patients identified by clustering algorithms is crucial to estimate the significance of subclasses of diseases detectable at bio-molecular level, and more in general to support bio-medical discovery of patterns in gene expression data. In this paper we present an experimental analysis of the reliability of clusters discovered in lung tumor patients using DNA microarray data. In particular we investigate if subclasses of lung adenocarcinoma can be detected with high reliability at bio-molecular level. To this end we apply cluster validity measures based on random projections recently proposed by Bertoni and coworkers. The results show that at least two subclasses of lung adenocarcinoma can be detected with relatively high reliability, confirming and extending previous findings reported in the literature.
LA - eng
KW - Cluster validity; clustering algorithms; bio-molecular taxonomy of tumors; DNA microarray data analysis.; DNA microarray data analysis
UR - http://eudml.org/doc/249627
ER -

References

top
  1. A. Alizadeh, D.T. Ross, C.M. Perou and M. van de Rijn, Towards a novel classification of human malignancies based on gene expression. J. Pathol.195 (2001) 41–52.  
  2. R Anbazhaganet al., Classification of small cell lung cancer and pulmonary carcinoid by gene expression profiles. Cancer Research59 (1999) 5119–5122.  
  3. F. Azuaje, A cluster validity framework for genome expression data. Bioinformatics18 (2002) 319–320.  
  4. A. Bertoni, R. Folgieri, F. Ruffino and G. Valentini, Assessment of clusters reliability for high dimensional genomic data, in BITS 2005, Bioinformatics Italian Society Meeting, Milano Italy (2005).  
  5. A. Bertoni and G. Valentini, Random projections for assessing gene expression cluster stability, in IJCNN 2005, The IEEE-INNS International Joint Conference on Neural Networks, Montreal (2005).  
  6. A. Bertoni and G. Valentini, Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses. Artif. Intell. Med. (in press)  
  7. J.C. Bezdek and N.R. Pal, Some new indexes of cluster validity. IEEE Trans. Systems, Man and Cybernetics Part B28 (1998) 301–315.  
  8. A. Bhattacharjee, W.G. Richards, J. Staunton, C. Li, S. Monti, P. Vasa, C. Ladd, J. Beheshti, R. Bueno, M. Gillette, M. Loda, G. Weber, E.J. Mark, E.S. Lander, W. Wong, B.E. Johnson, T.R. Golub, D.J. Sugarbaker and M. Meyerson, Classification of human lung carcinoma by mRNA expression profiling reveals distinct adenocarcinoma subclasses. PNAS98 (2001) 13790–13795.  
  9. N. Bolshakova, F. Azuaje and P. Cunningham, An integrated tool for microarray data clustering and cluster validity assessment. Bioinformatics21 (2005) 451–455.  
  10. O.S. Breathnachet al., Clinical features of patients with stage iiib and iv bronchioloalveolar carcinoma of the lung. Cancer86 (1999) 1165–1173.  
  11. P. Cheeseman and J. Stutz, Bayesian classification (autoclass): Theory and results, in Advances in Knowledge Discovery and Data Mining, edited by U. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurasamy, MIT Press, Cambridge, MA 2 (1996) 153–180.  
  12. J.J. Chen, R. Delongchamp, C. Tsai, H. Hsueh, F. Sisatare, K. Thompson, V. Deasi and J. Fuscoe, Analysis of variance components in gene expression data. Bioinformatics20 (2004) 1436–1446.  
  13. D.L. Davies and D.W. Bouldin, A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence1 (1979) 224–227.  
  14. S. Dudoit and J. Fridlyand, A prediction-based method for estimating the number of clusters in a dataset. Genome Biology3 (2002) 1–21.  
  15. S. Dudoit and J. Fridlyand, Bagging to improve the accuracy of a clustering procedure. Bioinformatics19 (2003) 1090–1099.  
  16. J. Dunn, Well separated clusters and optimal fuzzy partitions. J. Cybernetics4 (1974) 95–104.  
  17. M.E. Garberet al., Diversity of gene expression in adenocarcinoma of the lung. PNAS98 (2001) 13784–13789.  
  18. J.A. Hartigan and M.A. Wong, A k-means clustering algorithm. Appl. Stat.28 (1979) 100–108.  
  19. T.K. Ho, The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence20 (1998) 832–844.  
  20. A.K. Jain, M.N. Murty and P.J. Flynn, Data Clustering: a Review. ACM Computing Surveys31 (1999) 264–323.  
  21. W.B. Johnson and J. Lindenstrauss, Extensions of Lipshitz mapping into Hilbert space, in Conference in modern analysis and probability, Contemporary Mathematics. Amer. Math. Soc.26 (1984) 189–206.  
  22. L. Kaufman and P.J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, New York (1990).  
  23. M.K. Kerr and G.A. Curchill, Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments. PNAS98 (2001) 8961–8965.  
  24. B. King, Step-wise clustering procedures. J. Am. Stat. Assoc.69 (1967) 86–101.  
  25. L.M. McShane, D. Radmacher, B. Freidlin, R. Yu, M.C. Li and R. Simon, Method for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics18 (2002) 1462–1469.  
  26. S. Monti, P. Tamayo, J. Mesirov and T. Golub, Consensus Clustering: A Resampling-based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning52 (2003) 91–118.  
  27. P.J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comp. App. Math.20 (1987) 53–65.  
  28. M. Smolkin and D. Gosh, Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics36 (2003).  
  29. J.B. Sorensen, F.R. Hirsch, A. Gazdar and J.E. Olsen, Interobserver variability in histopahologic subtyping and grading of pulmonary adenocarcinoma. Cancer71 (1993) 2971–2976.  
  30. G. Valentini, Clusterv: a tool for assessing the reliability of clusters discovered in DNA microarray data. Bioinformatics22 (2006) 369–370.  
  31. J.H. Ward, Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc.58 (1963) 236–244.  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.