Algebraic Methods for Studying Interactions Between Epidemiological Variables

F. Ricceri; C. Fassino; G. Matullo; M. Roggero; M.-L. Torrente; P. Vineis; L. Terracini

Mathematical Modelling of Natural Phenomena (2012)

  • Volume: 7, Issue: 3, page 227-252
  • ISSN: 0973-5348

Abstract

top
BackgroundIndependence models among variables is one of the most relevant topics in epidemiology, particularly in molecular epidemiology for the study of gene-gene and gene-environment interactions. They have been studied using three main kinds of analysis: regression analysis, data mining approaches and Bayesian model selection. Recently, methods of algebraic statistics have been extensively used for applications to biology. In this paper we present a synthetic, but complete description of independence models in algebraic statistics and a new method of analyzing interactions, that is equivalent to the correction by Markov bases of the Fisher’s exact test. MethodsWe identified the suitable algebraic independence model for describing the dependence of two genetic variables from the occurrence of cancer and exploited the theory of toric varieties and Gröbner basis for developing an exact independence test based on the Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to the study of gene-gene interaction in Gen-Air, an European case-control study. We computed the p-value for each pair of genetic variables interacting with disease status and we compared our results with the standard asymptotic chi-square test. ResultsWe found an association among COMT Val158Met, APE1 Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer (p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs ERCC2 Lys751Gln and RAD51 172 G > T (p-value 0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value: 0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value: 0.0036). ConclusionTaking advantage of results from theoretical and computational algebra, the method we propose was more selective than other methods in detecting new interactions, and nevertheless its results were consistent with previous epidemiological and functional findings. It also helped us in controlling the multiple comparison problem. In the light of our results, we believe that the epidemiologic study of interactions can benefit of algebraic methods based on properties of toric varieties and Gröbner bases.

How to cite

top

Ricceri, F., et al. "Algebraic Methods for Studying Interactions Between Epidemiological Variables." Mathematical Modelling of Natural Phenomena 7.3 (2012): 227-252. <http://eudml.org/doc/222268>.

@article{Ricceri2012,
abstract = {BackgroundIndependence models among variables is one of the most relevant topics in epidemiology, particularly in molecular epidemiology for the study of gene-gene and gene-environment interactions. They have been studied using three main kinds of analysis: regression analysis, data mining approaches and Bayesian model selection. Recently, methods of algebraic statistics have been extensively used for applications to biology. In this paper we present a synthetic, but complete description of independence models in algebraic statistics and a new method of analyzing interactions, that is equivalent to the correction by Markov bases of the Fisher’s exact test. MethodsWe identified the suitable algebraic independence model for describing the dependence of two genetic variables from the occurrence of cancer and exploited the theory of toric varieties and Gröbner basis for developing an exact independence test based on the Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to the study of gene-gene interaction in Gen-Air, an European case-control study. We computed the p-value for each pair of genetic variables interacting with disease status and we compared our results with the standard asymptotic chi-square test. ResultsWe found an association among COMT Val158Met, APE1 Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer (p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs ERCC2 Lys751Gln and RAD51 172 G > T (p-value 0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value: 0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value: 0.0036). ConclusionTaking advantage of results from theoretical and computational algebra, the method we propose was more selective than other methods in detecting new interactions, and nevertheless its results were consistent with previous epidemiological and functional findings. It also helped us in controlling the multiple comparison problem. In the light of our results, we believe that the epidemiologic study of interactions can benefit of algebraic methods based on properties of toric varieties and Gröbner bases.},
author = {Ricceri, F., Fassino, C., Matullo, G., Roggero, M., Torrente, M.-L., Vineis, P., Terracini, L.},
journal = {Mathematical Modelling of Natural Phenomena},
keywords = {polymorphism; interaction; Markov basis; Diaconis-Sturmfels algorithm; independence model; toric variety},
language = {eng},
month = {6},
number = {3},
pages = {227-252},
publisher = {EDP Sciences},
title = {Algebraic Methods for Studying Interactions Between Epidemiological Variables},
url = {http://eudml.org/doc/222268},
volume = {7},
year = {2012},
}

TY - JOUR
AU - Ricceri, F.
AU - Fassino, C.
AU - Matullo, G.
AU - Roggero, M.
AU - Torrente, M.-L.
AU - Vineis, P.
AU - Terracini, L.
TI - Algebraic Methods for Studying Interactions Between Epidemiological Variables
JO - Mathematical Modelling of Natural Phenomena
DA - 2012/6//
PB - EDP Sciences
VL - 7
IS - 3
SP - 227
EP - 252
AB - BackgroundIndependence models among variables is one of the most relevant topics in epidemiology, particularly in molecular epidemiology for the study of gene-gene and gene-environment interactions. They have been studied using three main kinds of analysis: regression analysis, data mining approaches and Bayesian model selection. Recently, methods of algebraic statistics have been extensively used for applications to biology. In this paper we present a synthetic, but complete description of independence models in algebraic statistics and a new method of analyzing interactions, that is equivalent to the correction by Markov bases of the Fisher’s exact test. MethodsWe identified the suitable algebraic independence model for describing the dependence of two genetic variables from the occurrence of cancer and exploited the theory of toric varieties and Gröbner basis for developing an exact independence test based on the Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to the study of gene-gene interaction in Gen-Air, an European case-control study. We computed the p-value for each pair of genetic variables interacting with disease status and we compared our results with the standard asymptotic chi-square test. ResultsWe found an association among COMT Val158Met, APE1 Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer (p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs ERCC2 Lys751Gln and RAD51 172 G > T (p-value 0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value: 0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value: 0.0036). ConclusionTaking advantage of results from theoretical and computational algebra, the method we propose was more selective than other methods in detecting new interactions, and nevertheless its results were consistent with previous epidemiological and functional findings. It also helped us in controlling the multiple comparison problem. In the light of our results, we believe that the epidemiologic study of interactions can benefit of algebraic methods based on properties of toric varieties and Gröbner bases.
LA - eng
KW - polymorphism; interaction; Markov basis; Diaconis-Sturmfels algorithm; independence model; toric variety
UR - http://eudml.org/doc/222268
ER -

References

top
  1. A. Agresti, Exact inference for categorical data : Recent advances and continuing controversies, Statist. Med.20 (2001), 2709–2722.  
  2. A. Agresti, Categorical data analysis, Wiley, 2002.  Zbl1018.62002
  3. H. Aurtrup, Genetic polymorphisms in human xenobiotica metabolizing enzymes as susceptibility factors in toxic response, Mutat Res464 (2000), 65–76.  
  4. N. Beerenwinkel, L. Pachter, B. Sturmfels, S.F. Elena, R.E. Lenski, Analysis of epistatic interactions and fitness landscapes using a new geometric approach., BMC Evol Biol.13 (2007), 7 :60.  
  5. S.P. Cleary, M. Cotterchio, E. Shi, S. Gallinger, P. Harper, Cigarette smoking, genetic variants in carcinogen-metabolizing enzymes, and colorectal cancer risk, Am. J. Epidemiol.172 (2010), no. 9, 1000–1014.  
  6. H.J. Cordell, Detecting gene-gene interactions that underlie human diseases, Nat Rev Genet, 10 (2009), 392–404.  
  7. D. Cox, J. Little, D. O’Shea, Ideals, varieties, and algorithms, Undergraduate Texts in Mathematics, vol. 60, Springer-Verlag, New York, 1992.  
  8. A.C. Davison, D.V. Hinkley, Bootstrap methods and their applications, Cambridge University Press, Cambridge, 1997.  Zbl0886.62001
  9. P. Diaconis, B. Sturmfels, Algebraic algorithms for sampling from conditional distributions, Ann. Statist., 26 (1998), 363–397.  Zbl0952.62088
  10. M. Drton, S. Sullivant, Algebraic statistical model, Statist. Sinica., 17 (2007), 1273–1297.  Zbl1132.62003
  11. F. Dudbridge, A. Gusnanto, B.P.C. Koeleman, Detecting multiple associations in genome-wide studies, Human Genomics, 2 (2006), 310–317.  
  12. F. Dudbridge, B.P.C. Koeleman, Efficient computation of signifcance levels for multiple associations in large studies of correlated data, including genomewide association studies, Am. J. Hum. Genet, 75 (2004), 424–435.  
  13. E.S. Edgington, Randomization tests (3rd ed.), Marcel Dekker, New York, 1995.  Zbl0893.62036
  14. B. Efron, The jackknife, the bootstrap and other resampling plans, Society of Industrial and Applied Mathematics CBMS-NFS Monographs, vol. 38, Capital City Press, Philadelphia, 1982.  Zbl0496.62036
  15. L. Fan, J.O. Fuss, Q.J. Cheng, A.S. Arvai, M. Hammel, V.A. Roberts, P.K. Cooper, J.A. Tainer, XPD helicase structures and activities : insights into the cancer and aging phenotypes from xpd mutations., Cell, 133 (2008), 789–800.  
  16. C. Fassino, M.L. Torrente, Simple approximate varieties for sets of empirical points, Submitted. Available at  Zbl1291.65081URIhttp://arxiv.org/abs/1008.0274
  17. I.O. Filiz, X. Guo, J. Morton, B. Sturmfels, Graphical models for correlated defaults, Available at , 2008.  URIhttp://arxiv.org/pdf/0809.1393v1.pdf
  18. R.A. Fisher, The design of experiments, Oliver and Boyd, Edinburgh, 1935.  
  19. W. Fulton, Introduction to toric varieties, Princeton University Press, 1993.  Zbl0813.14039
  20. P. Good, Resampling methods : A practical guide to data analysis (3rd edition), Birchäuser, Boston, 2006.  Zbl0952.62041
  21. H. Gorji, N Shahbazi, P. Habibollahi, S.M. Tavangar, A. Firooz, M.H. Ghahremani, The glutathione-S-transferase P1 polymorphisms correlates with changes in expression of TP53 tumor suppressor in cutaneous basal cell carcinoma, Dermatol Sci56 (2009), 208–10.  
  22. L.W. Hahn, M.D. Ritchie, J.H. Moore, Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions, Bioinformatics, 19 (2003), 376–382.  
  23. I. Hallgrimsdottir, B Sturmfels, Resultants in genetic linkage analysis, Journal of Symbolic Computation, 41 (2006), 125–137.  Zbl1120.92029
  24. D.Y. Lin, An efficient monte carlo approach to assessing statistical significance in genomic studies, Bioinformatics, 21 (2005), 781–787.  
  25. H.W. Lo, L. Stephenson, X. Cao, M. Milas, R. Pollock, F. Ali-Osman, Identification and functional characterization of the human glutathione S-transferaseP1 gene as a novel transcriptional target of the p53 tumor suppressor gene., Mol Cancer Res, 6 (2008), 843–50.  
  26. A.S. Malaspinas, C. Uhler, Detecting epistases via markov bases, Journal of Algebraic Statistics, 2 (2011), no. 1, 36–53.  
  27. M. Manuguerra, G. Matullo, F. Veglia, H. Autrup, A.M. Dunning, S. Garte, E. Gormally, C. Malaveille, S. Guarrera, S. Polidoro, F. Saletta, M. Peluso, L. Airoldi, K. Overvad, O. Raaschou-Nielsen, F. Clavel-Chapelon, J. Linseisen, H. Boeing, D. Trichopoulos, A. Kalandidi, D. Palli, V. Krogh, R. Tumino, S. Panico, H.B. Bueno-De Mesquita, P.H. Peeters, E. Lund, G. Pera, C. Martinez, P. Amiano, A. Barricarte, M.J. Tormo, J.R. Quiros, G. Berglund, L. Janzon, B. Jarvholm, N.E. Day, N.E. Allen, R. Saracci, R. Kaaks, P. Ferrari, E. Riboli, P. Vineis, Multi-factor dimensionality reduction applied to a large prospective investigation on gene-gene and gene-environment interactions, Carcinogenesis, 28(2) (2007), 414–22.  
  28. T. Martone, P. Vineis, C. Malaveille, B. Terracini, Impact of polymorphisms in xeno(endo)biotic metabolism on pattern and frequency of p53 mutations in bladder cancer., Mutat Res, 462 (2000), 303–9.  
  29. G. Matullo, A.M. Dunning, S. Guarrera, C. Baynes, S. Polidoro, S. Garte, H. Autrup, C. Malaveille, M. Peluso, L. Airoldi, F. Veglia, E. Gormally, G. Hoek, M. Krzyzanowski, K. Overvad, O. Raaschou-Nielsen, F. Clavel-Chapelon, J. Linseisen, H. Boeing, A. Trichopoulou, D. Palli, V. Krogh, R. Tumino, S. Panico, H.B. Bueno-De Mesquita, P.H. Peeters, E. Lund, G. Pera, C. Martinez, M. Dorronsoro, A. Barricarte, M.J. Tormo, J.R. Quiros, N.E. Day, T.J. Key, R. Saracci, R. Kaaks, E. Riboli, P. Vineis, DNA repair polymorphisms and cancer risk in non-smokers in a cohort study, Carcinogenesis, 27(5) (2006), 997–1007.  
  30. Y. Meng, Q. Ma, Y. Yu, J. Farrell, L.A. Farrer, M.A. Wilcox, Multifactor-dimensionality reduction versus family-based association tests in detecting susceptibility loci in discordant sib-pair studies., BMC Genet, 30(6) (2005), S146.  
  31. J. Molitor, M. Papathomas, M Jerrett, and S. Richardson, Bayesian profile regression with an application to the national survey of children’s health., Biostatistics, 11 (2010), 484–498.  
  32. D.S. Moore, G. McCabe, W. Duckworth, S. Sclove, Chapter 18 :bootstrap methods and permutation tests, The Practice of Business Statistics, W.H. Freeman, New York, 2003.  
  33. L. Pachter, B. Sturmfels, Parametric inference for biological sequence analysis, Proc Natl Acad Sci U S A, 101 (2004), 16138–43.  Zbl1075.62101
  34. L. Pachter, B. Sturmfels, Tropical geometry of statistical models, Proc Natl Acad Sci U S A, 101 (2004), 16132–7.  Zbl1135.62302
  35. M. Papathomas, J. Molitor, S. Richardson, E. Riboli, P. Vineis, Examining the joint effect of multiple risk factors using exposure risk profiles : lung cancer in nonsmokers, Environ. Health Perspect, 119 (2011), 84–91.  
  36. L. Patchter, B. Sturmfels, Algebraic statistics for computational biology, Cambridge University Press, 2005.  
  37. M. Peluso, P. Hainaut, L. Airoldi, H. Autrup, A. Dunning, S. Garte, E. Gormally, C. Malaveille, G. Matullo, A. Munnia, E. Riboli, P. Vineis, Methodology of laboratory measurements in prospective studies on gene-environment interactions : the experience of GenAir, Mutat Res, 574 (2005), 92–104.  
  38. G. Pistone, E. Riccomagno, and H.P. Wynn, Algebraic statistics, Chapman and Hall/CRC, Boca Raton, 2001.  Zbl0960.62003
  39. F. Rapallo, Algebraic Markov bases and MCMC for two-way contingency tables, Scandinavian Journal of Statistics, 30 (2003), 385–397.  Zbl1055.65018
  40. F. Rapallo, Algebraic exact inference for rater agreement models, Statistical Methods & Applications, 14 (2005), 45–66.  Zbl1089.62136
  41. E. Riboli, The european prospective investigation into cancer and nutrition (EPIC) : plans and progress., J. Nutr., 131 (2001), no. 1, 170–175.  
  42. T.K. Rice, N.J. Schork, D.C. Rao, Methods for handling multiple testing, Advances in Genetics, 60 (2008), 293–308.  
  43. M.D. Ritchie, L.W. Hahn, N. Roodi, L.R. Bailey, W.D. Dupont, F.F. Parl, J.H. Moore, Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Am. J. Hum. Genet., 69 (2001), no. 1, 138–47.  
  44. J.L. Simon, Resampling : The new statistics (2nd edition), , 1997.  URIhttp://bcs.whfreeman.com/pbs/
  45. B. Sturmfels, Gröbner bases and convex polytopes, American Mathematical Society, 1996.  Zbl0856.13020
  46. B. Sturmfels, Solving systems of polynomial equations, American Mathematical Society, 2002.  Zbl1101.13040
  47. B. Sturmfels, Algebra and geometry of statistical models, Tech. report, John von Neumann Lectures, TU München, 2003.  
  48. B. Sturmfels, S. Sullivant, Toric ideals of phylogenetic invariants, J Comput Biol, 12 (2005), 204–228.  
  49. P. Vineis, L. Airoldi, F. Veglia, L. Olgiati, R. Pastorelli, H. Autrup, A. Dunning, S. Garte, E. Gormally, P. Hainaut, C. Malaveille, G. Matullo, M. Peluso, K. Overvad, A. Tjonneland, F. Clavel-Chapelon, H. Boeing, V. Krogh, D. Palli, S. Panico, R. Tumino, B. Bueno-De Mesquita, P. Peeters, G. Berglund, G. Hallmans, R. Saracci, E. Riboli, Environmental tobacco smoke and risk of respiratory cancer and chronic obstructive pulmonary disease in former smokers and never smokers in the EPIC prospective study., BMJ330 (2005), 277.  
  50. S. Wang, W. Xiong, W. Ma, S. Chanock, W. Jedrychowski, R. Wu, F.P. Perera, Gene-environment interactions on growth trajectories, Genetic Epidemiology (2012), doi : .  URI10.1002/gepi.21613
  51. R.D. Wood, Mammalian nucleotide excision repair proteins and interstrand crosslink repair, Environ Mol Mutagen, 51 (2010), 520–6.  
  52. Y. Zhang, J.S. Liu, Bayesian inference of epistatic interactions in case-control studies., Nature Genet, 39 (2007), 1167–1173.  
  53. Y. Zhang, L.H. Rohde, H. Wu, Involvement of nucleotide excision and mismatch repair mechanisms in double strand break repair, Curr Genomics, 10 (2009), 250–8.  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.