On a matching distance between rooted phylogenetic trees

Damian Bogdanowicz; Krzysztof Giaro

International Journal of Applied Mathematics and Computer Science (2013)

  • Volume: 23, Issue: 3, page 669-684
  • ISSN: 1641-876X

Abstract

top
The Robinson-Foulds (RF) distance is the most popular method of evaluating the dissimilarity between phylogenetic trees. In this paper, we define and explore in detail properties of the Matching Cluster (MC) distance, which can be regarded as a refinement of the RF metric for rooted trees. Similarly to RF, MC operates on clusters of compared trees, but the distance evaluation is more complex. Using the graph theoretic approach based on a minimum-weight perfect matching in bipartite graphs, the values of similarity between clusters are transformed to the final MC-score of the dissimilarity of trees. The analyzed properties give insight into the structure of the metric space generated by MC, its relations with the Matching Split (MS) distance of unrooted trees and asymptotic behavior of the expected distance between binary n-leaf trees selected uniformly in both MC and MS (Θ(n^{3/2})).

How to cite

top

Damian Bogdanowicz, and Krzysztof Giaro. "On a matching distance between rooted phylogenetic trees." International Journal of Applied Mathematics and Computer Science 23.3 (2013): 669-684. <http://eudml.org/doc/262378>.

@article{DamianBogdanowicz2013,
abstract = {The Robinson-Foulds (RF) distance is the most popular method of evaluating the dissimilarity between phylogenetic trees. In this paper, we define and explore in detail properties of the Matching Cluster (MC) distance, which can be regarded as a refinement of the RF metric for rooted trees. Similarly to RF, MC operates on clusters of compared trees, but the distance evaluation is more complex. Using the graph theoretic approach based on a minimum-weight perfect matching in bipartite graphs, the values of similarity between clusters are transformed to the final MC-score of the dissimilarity of trees. The analyzed properties give insight into the structure of the metric space generated by MC, its relations with the Matching Split (MS) distance of unrooted trees and asymptotic behavior of the expected distance between binary n-leaf trees selected uniformly in both MC and MS (Θ(n^\{3/2\})).},
author = {Damian Bogdanowicz, Krzysztof Giaro},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {phylogenetic tree; phylogenetic tree metric; phylogenetic tree comparison; matching cluster distance; matching split distance},
language = {eng},
number = {3},
pages = {669-684},
title = {On a matching distance between rooted phylogenetic trees},
url = {http://eudml.org/doc/262378},
volume = {23},
year = {2013},
}

TY - JOUR
AU - Damian Bogdanowicz
AU - Krzysztof Giaro
TI - On a matching distance between rooted phylogenetic trees
JO - International Journal of Applied Mathematics and Computer Science
PY - 2013
VL - 23
IS - 3
SP - 669
EP - 684
AB - The Robinson-Foulds (RF) distance is the most popular method of evaluating the dissimilarity between phylogenetic trees. In this paper, we define and explore in detail properties of the Matching Cluster (MC) distance, which can be regarded as a refinement of the RF metric for rooted trees. Similarly to RF, MC operates on clusters of compared trees, but the distance evaluation is more complex. Using the graph theoretic approach based on a minimum-weight perfect matching in bipartite graphs, the values of similarity between clusters are transformed to the final MC-score of the dissimilarity of trees. The analyzed properties give insight into the structure of the metric space generated by MC, its relations with the Matching Split (MS) distance of unrooted trees and asymptotic behavior of the expected distance between binary n-leaf trees selected uniformly in both MC and MS (Θ(n^{3/2})).
LA - eng
KW - phylogenetic tree; phylogenetic tree metric; phylogenetic tree comparison; matching cluster distance; matching split distance
UR - http://eudml.org/doc/262378
ER -

References

top
  1. Alberich, R., Cardona, G., Rosselló, F. and Valiente, G. (2009). An algebraic metric for phylogenetic trees, Applied Mathematics Letters 22(9): 1320-1324. Zbl1173.05314
  2. Aldous, D.J. (1991). The continuum random tree II: An overview, in M.T. Barlow and N. H. Bingham (Eds.), Stochastic Analysis, Cambridge University Press, Cambridge, pp. 23-70. Zbl0791.60008
  3. Bansal, M., Burleigh, J.G., Eulenstein, O. and Fernández-Baca, D. (2010). Robinson-Foulds supertrees, Algorithms for Molecular Biology 5(1): 18. 
  4. Bansal, M.S., Dong, J. and Fernández-Baca, D. (2011). Comparing and aggregating partially resolved trees, Theoretical Computer Science 412(48): 6634-6652. Zbl1227.92040
  5. Biedrzycki, R. and Arabas, J. (2012). KIS: An automated attribute induction method for classification of DNA sequences, International Journal of Applied Mathematics and Computer Science 22(3): 711-721, DOI: 10.2478/v10006-012-0053-2. Zbl1330.92092
  6. Bininda-Emonds, O.R.P., Cardillo, M., Jones, K.E., MacPhee, R. D.E., Beck, R.M.D., Grenyer, R., Price, S.A., Vos, R.A., Gittleman, J.L. and Purvis, A. (2007). The delayed rise of present-day mammals, Nature 446(7135): 507-512. 
  7. Blum, M.G.B., François, O. and Janson, S. (2006). The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance, The Annals of Applied Probability 16(4): 2195-2214. Zbl1124.05025
  8. Boc, A., Philippe, H. and Makarenkov, V. (2010). Inferring and validating horizontal gene transfer events using bipartition dissimilarity, Systematic Biology 59(2): 195-211. 
  9. Bogdanowicz, D. and Giaro, K. (2012). Matching split distance for unrooted binary phylogenetic trees, IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(1): 150-160. 
  10. Bogdanowicz, D., Giaro, K. and Wróbel, B. (2012). TreeCmp: Comparison of trees in polynomial time, Evolutionary Bioinformatics 8: 475-487. 
  11. Bolikowski, L. and Gambin, A. (2007). New metrics for phylogenies, Fundamenta Informaticae 78(2): 199-216. Zbl1116.92048
  12. Boorman, S.A. and Olivier, D.C. (1973). Metrics on spaces of finite trees, Journal of Mathematical Psychology 10(1): 26-59. Zbl0271.92011
  13. Bordewich, M. and Semple, C. (2005). On the computational complexity of the rooted subtree prune and regraft distance, Annals of Combinatorics 8(4): 409-423. Zbl1059.05035
  14. Brinkmeyer, M., Griebel, T. and Böcker, S. (2011). Polynomial supertree methods revisited, Advances in Bioinformatics 2011: 524182. Zbl1311.92132
  15. Bryant, D. (1997). Building Trees, Hunting for Trees, and Comparing Trees-Theory and Methods in Phylogenetic Analysis, Ph.D. thesis, University of Canterbury, Christchurch. 
  16. Cardona, G., Llabrés, M., Rosselló, F. and Valiente, G. (2009a). Metrics for phylogenetic networks I: Generalizations of the Robinson-Foulds metric, IEEE/ACM Transactions on Computational Biology and Bioinformatics 6(1): 46-61. 
  17. Cardona, G., Llabrés, M., Rosselló, F. and Valiente, G. (2009b). Metrics for phylogenetic networks II: Nodal and triplets metrics, IEEE/ACM Transactions on Computational Biology and Bioinformatics 6(3): 454-469. 
  18. Cardona, G., Rossello, F. and Valiente, G. (2009). Comparison of tree-child phylogenetic networks, IEEE/ACM Transactions on Computational Biology and Bioinformatics 6(4): 552-569. 
  19. Cardona, G., Llabrés, M., Rosselló, F. and Valiente, G. (2010). Nodal distances for rooted phylogenetic trees, Journal of Mathematical Biology 61(2): 253-276. Zbl1202.92060
  20. Critchlow, D.E., Pearl, D.K. and Qian, C. (1996). The triples distance for rooted bifurcating phylogenetic trees, Systematic Biology 45(3): 323-334. 
  21. Darlu, P. and Guénoche, A. (2011). TreeOfTrees method to evaluate the congruence between gene trees, Journal of Classification 28(3): 390-403. Zbl06562761
  22. Davies, T.J., Barraclough, T.G., Chase, M.W., Soltis, P.S., Soltis, D.E. and Savolainen, V. (2004). Darwin's abominable mystery: Insights from a supertree of the angiosperms, Proceedings of the National Academy of Sciences of the United States of America 101(7): 1904-1909. 
  23. Felsenstein, J. (2003). Inferring Phylogenies, Sinauer Associates, Sunderland, MA. 
  24. Frąckiewicz, M. and Palus, H. (2011). KHM clustering technique as a segmentation method for endoscopic colour images, International Journal of Applied Mathematics and Computer Science 21(1): 203-209, DOI: 10.2478/v10006-011-0015-0. 
  25. Gabow, H.N. and Tarjan, R.E. (1989). Faster scaling algorithms for network problems, SIAM Journal on Computing 18(5): 1013-1036. Zbl0679.68079
  26. Górecki, P. and Eulenstein, O. (2012). A Robinson-Foulds measure to compare unrooted trees with rooted trees, in L. Bleris, I. Mandoiu, R. Schwartz and J. Wang (Eds.), Bioinformatics Research and Applications, Lecture Notes in Computer Science, Vol. 7292, Springer, Berlin/Heidelberg, pp. 115-126. 
  27. Gusfield, D. (1991). Efficient algorithms for inferring evolutionary trees, Networks 21(1): 19-28. Zbl0719.92015
  28. Hayes, M., Walenstein, A. and Lakhotia, A. (2009). Evaluation of malware phylogeny modelling systems using automated variant generation, Journal in Computer Virology 5(4): 335-343. 
  29. Hillis, D.M., Heath, T.A. and John, K.S. (2005). Analysis and visualization of tree space, Systematic Biology 54(3): 471-482. 
  30. Kennedy, M., Page, R. D.M. and Prum, R. (2002). Seabird supertrees: Combining partial estimates of procellariiform phylogeny, The Auk 119(1): 88-108. 
  31. Lin, Y., Rajan, V. and Moret, B.M.E. (2012). A metric for phylogenetic trees based on matching, IEEE/ACM Transactions on Computational Biology and Bioinformatics 9(4): 1014-1022. 
  32. Ma, B., Li, M. and Zhang, L. (1998). On reconstructing species trees from gene trees in term of duplications and losses, Proceedings of the 2nd Annual International Conference on Computational Molecular Biology, RECOMB'98, New York, NY, USA, pp. 182-191. 
  33. McKenzie, A. and Steel, M. (2000). Distributions of cherries for two models of trees, Mathematical Biosciences 164(1): 81-92. Zbl0947.92021
  34. Munzner, T., Guimbretière, F., Tasiran, S., Zhang, L. and Zhou, Y. (2003). TreeJuxtaposer: Scalable tree comparison using focus+context with guaranteed visibility, ACM Transactions on Graphics 22(3): 453-462. 
  35. Nguyen, N., Mirarab, S. and Warnow, T. (2012). MRL and SuperFine+MRL: New supertree methods, Algorithms for Molecular Biology 7(1): 3. 
  36. Nye, T.M., Liò, P. and Gilks, W.R. (2006). A novel algorithm and web-based tool for comparing two alternative phylogenetic trees, Bioinformatics 22(1): 117-119. 
  37. Orlin, J.B. and Ahuja, R.K. (1992). New scaling algorithms for the assignment and minimum mean cycle problems, Mathematical Programming 54(1-3): 41-56. Zbl0764.90059
  38. Penny, D., Watson, E.E. and Steel, M.A. (1993). Trees from languages and genes are very similar, Systematic Biology 42(3): 382-384. 
  39. Pompei, S., Loreto, V. and Tria, F. (2011). On the accuracy of language trees, PLoS ONE 6(6): e20109. 
  40. Restrepo, G., Héber, M. and Llanos, E.J. (2007). Three dissimilarity measures to contrast dendrograms, Journal of Chemical Information and Modeling 47(3): 761-770. 
  41. Robinson, D.F. and Foulds, L.R. (1981). Comparison of phylogenetic trees, Mathematical Biosciences 53(1-2): 131-147. Zbl0451.92006
  42. Sackin, M.J. (1972). “Good” and “bad” phenograms, Systematic Zoology 21(2): 225-226. 
  43. Semple, C. and Steel, M. (2003). Phylogenetics, Oxford University Press, Oxford. 
  44. Shao, K.-T. and Sokal, R.R. (1990). Tree balance, Systematic Zoology 39(3): 266-276. 
  45. Steel, M. A. and Penny, D. (1993). Distributions of tree comparison metrics-some new results, Systematic Biology 42(2): 126-141. 
  46. Stockham, C., Wang, L.-S. and Warnow, T. (2002). Statistically based postprocessing of phylogenetic analysis by clustering, Bioinformatics 18(suppl 1): S285-S293. 
  47. Suri, R. and Warnow, T. (2010). Spruce, Website, http://www.cs.utexas.edu/~phylo/software/spruce/. 
  48. Swenson, M.S., Suri, R., Linder, C.R. and Warnow, T. (2011). An experimental study of Quartets MaxCut and other supertree methods, Algorithms for Molecular Biology 6(1): 7. 
  49. Swofford, D. (2002). PAUP*. Phylogenetic Analysis Using Parsimony (*and other methods). Version 4, Sinauer Associates, Sunderland, MA. 
  50. Wang, J.T., Shan, H., Shasha, D. and Piel, W.H. (2005). Fast structural search in phylogenetic databases, Evolutionary Bioinformatics Online 1: 37-46. 
  51. Williams, W.T. and Clifford, H.T. (1971). On the comparison of two classifications of the same set of elements, Taxon 20(4): 519-522. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.