# Algebraic Methods for Studying Interactions Between Epidemiological Variables

F. Ricceri; C. Fassino; G. Matullo; M. Roggero; M.-L. Torrente; P. Vineis; L. Terracini

Mathematical Modelling of Natural Phenomena (2012)

- Volume: 7, Issue: 3, page 227-252
- ISSN: 0973-5348

## Access Full Article

top## Abstract

top## How to cite

topRicceri, F., et al. "Algebraic Methods for Studying Interactions Between Epidemiological Variables." Mathematical Modelling of Natural Phenomena 7.3 (2012): 227-252. <http://eudml.org/doc/222268>.

@article{Ricceri2012,

abstract = {BackgroundIndependence models among variables is one of the most relevant topics in epidemiology,
particularly in molecular epidemiology for the study of gene-gene and gene-environment
interactions. They have been studied using three main kinds of analysis: regression
analysis, data mining approaches and Bayesian model selection. Recently, methods of
algebraic statistics have been extensively used for applications to biology. In this paper
we present a synthetic, but complete description of independence models in algebraic
statistics and a new method of analyzing interactions, that is equivalent to the
correction by Markov bases of the Fisher’s exact test. MethodsWe identified the suitable algebraic independence model for describing the dependence of
two genetic variables from the occurrence of cancer and exploited the theory of toric
varieties and Gröbner basis for developing an exact independence test based on the
Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to
the study of gene-gene interaction in Gen-Air, an European case-control study. We computed
the p-value for each pair of genetic variables interacting with disease status and we
compared our results with the standard asymptotic chi-square test. ResultsWe found an association among COMT Val158Met, APE1
Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction
among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer
(p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs
ERCC2 Lys751Gln and RAD51 172 G > T (p-value
0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value:
0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value:
0.0036). ConclusionTaking advantage of results from theoretical and computational algebra, the method we
propose was more selective than other methods in detecting new interactions, and
nevertheless its results were consistent with previous epidemiological and functional
findings. It also helped us in controlling the multiple comparison problem. In the light
of our results, we believe that the epidemiologic study of interactions can benefit of
algebraic methods based on properties of toric varieties and Gröbner bases.},

author = {Ricceri, F., Fassino, C., Matullo, G., Roggero, M., Torrente, M.-L., Vineis, P., Terracini, L.},

journal = {Mathematical Modelling of Natural Phenomena},

keywords = {polymorphism; interaction; Markov basis; Diaconis-Sturmfels algorithm; independence model; toric variety},

language = {eng},

month = {6},

number = {3},

pages = {227-252},

publisher = {EDP Sciences},

title = {Algebraic Methods for Studying Interactions Between Epidemiological Variables},

url = {http://eudml.org/doc/222268},

volume = {7},

year = {2012},

}

TY - JOUR

AU - Ricceri, F.

AU - Fassino, C.

AU - Matullo, G.

AU - Roggero, M.

AU - Torrente, M.-L.

AU - Vineis, P.

AU - Terracini, L.

TI - Algebraic Methods for Studying Interactions Between Epidemiological Variables

JO - Mathematical Modelling of Natural Phenomena

DA - 2012/6//

PB - EDP Sciences

VL - 7

IS - 3

SP - 227

EP - 252

AB - BackgroundIndependence models among variables is one of the most relevant topics in epidemiology,
particularly in molecular epidemiology for the study of gene-gene and gene-environment
interactions. They have been studied using three main kinds of analysis: regression
analysis, data mining approaches and Bayesian model selection. Recently, methods of
algebraic statistics have been extensively used for applications to biology. In this paper
we present a synthetic, but complete description of independence models in algebraic
statistics and a new method of analyzing interactions, that is equivalent to the
correction by Markov bases of the Fisher’s exact test. MethodsWe identified the suitable algebraic independence model for describing the dependence of
two genetic variables from the occurrence of cancer and exploited the theory of toric
varieties and Gröbner basis for developing an exact independence test based on the
Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to
the study of gene-gene interaction in Gen-Air, an European case-control study. We computed
the p-value for each pair of genetic variables interacting with disease status and we
compared our results with the standard asymptotic chi-square test. ResultsWe found an association among COMT Val158Met, APE1
Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction
among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer
(p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs
ERCC2 Lys751Gln and RAD51 172 G > T (p-value
0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value:
0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value:
0.0036). ConclusionTaking advantage of results from theoretical and computational algebra, the method we
propose was more selective than other methods in detecting new interactions, and
nevertheless its results were consistent with previous epidemiological and functional
findings. It also helped us in controlling the multiple comparison problem. In the light
of our results, we believe that the epidemiologic study of interactions can benefit of
algebraic methods based on properties of toric varieties and Gröbner bases.

LA - eng

KW - polymorphism; interaction; Markov basis; Diaconis-Sturmfels algorithm; independence model; toric variety

UR - http://eudml.org/doc/222268

ER -

## References

top- A. Agresti, Exact inference for categorical data : Recent advances and continuing controversies, Statist. Med.20 (2001), 2709–2722.
- A. Agresti, Categorical data analysis, Wiley, 2002. Zbl1018.62002
- H. Aurtrup, Genetic polymorphisms in human xenobiotica metabolizing enzymes as susceptibility factors in toxic response, Mutat Res464 (2000), 65–76.
- N. Beerenwinkel, L. Pachter, B. Sturmfels, S.F. Elena, R.E. Lenski, Analysis of epistatic interactions and fitness landscapes using a new geometric approach., BMC Evol Biol.13 (2007), 7 :60.
- S.P. Cleary, M. Cotterchio, E. Shi, S. Gallinger, P. Harper, Cigarette smoking, genetic variants in carcinogen-metabolizing enzymes, and colorectal cancer risk, Am. J. Epidemiol.172 (2010), no. 9, 1000–1014.
- H.J. Cordell, Detecting gene-gene interactions that underlie human diseases, Nat Rev Genet, 10 (2009), 392–404.
- D. Cox, J. Little, D. O’Shea, Ideals, varieties, and algorithms, Undergraduate Texts in Mathematics, vol. 60, Springer-Verlag, New York, 1992.
- A.C. Davison, D.V. Hinkley, Bootstrap methods and their applications, Cambridge University Press, Cambridge, 1997. Zbl0886.62001
- P. Diaconis, B. Sturmfels, Algebraic algorithms for sampling from conditional distributions, Ann. Statist., 26 (1998), 363–397. Zbl0952.62088
- M. Drton, S. Sullivant, Algebraic statistical model, Statist. Sinica., 17 (2007), 1273–1297. Zbl1132.62003
- F. Dudbridge, A. Gusnanto, B.P.C. Koeleman, Detecting multiple associations in genome-wide studies, Human Genomics, 2 (2006), 310–317.
- F. Dudbridge, B.P.C. Koeleman, Efficient computation of signifcance levels for multiple associations in large studies of correlated data, including genomewide association studies, Am. J. Hum. Genet, 75 (2004), 424–435.
- E.S. Edgington, Randomization tests (3rd ed.), Marcel Dekker, New York, 1995. Zbl0893.62036
- B. Efron, The jackknife, the bootstrap and other resampling plans, Society of Industrial and Applied Mathematics CBMS-NFS Monographs, vol. 38, Capital City Press, Philadelphia, 1982. Zbl0496.62036
- L. Fan, J.O. Fuss, Q.J. Cheng, A.S. Arvai, M. Hammel, V.A. Roberts, P.K. Cooper, J.A. Tainer, XPD helicase structures and activities : insights into the cancer and aging phenotypes from xpd mutations., Cell, 133 (2008), 789–800.
- C. Fassino, M.L. Torrente, Simple approximate varieties for sets of empirical points, Submitted. Available at Zbl1291.65081URIhttp://arxiv.org/abs/1008.0274
- I.O. Filiz, X. Guo, J. Morton, B. Sturmfels, Graphical models for correlated defaults, Available at , 2008. URIhttp://arxiv.org/pdf/0809.1393v1.pdf
- R.A. Fisher, The design of experiments, Oliver and Boyd, Edinburgh, 1935.
- W. Fulton, Introduction to toric varieties, Princeton University Press, 1993. Zbl0813.14039
- P. Good, Resampling methods : A practical guide to data analysis (3rd edition), Birchäuser, Boston, 2006. Zbl0952.62041
- H. Gorji, N Shahbazi, P. Habibollahi, S.M. Tavangar, A. Firooz, M.H. Ghahremani, The glutathione-S-transferase P1 polymorphisms correlates with changes in expression of TP53 tumor suppressor in cutaneous basal cell carcinoma, Dermatol Sci56 (2009), 208–10.
- L.W. Hahn, M.D. Ritchie, J.H. Moore, Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions, Bioinformatics, 19 (2003), 376–382.
- I. Hallgrimsdottir, B Sturmfels, Resultants in genetic linkage analysis, Journal of Symbolic Computation, 41 (2006), 125–137. Zbl1120.92029
- D.Y. Lin, An efficient monte carlo approach to assessing statistical significance in genomic studies, Bioinformatics, 21 (2005), 781–787.
- H.W. Lo, L. Stephenson, X. Cao, M. Milas, R. Pollock, F. Ali-Osman, Identification and functional characterization of the human glutathione S-transferaseP1 gene as a novel transcriptional target of the p53 tumor suppressor gene., Mol Cancer Res, 6 (2008), 843–50.
- A.S. Malaspinas, C. Uhler, Detecting epistases via markov bases, Journal of Algebraic Statistics, 2 (2011), no. 1, 36–53.
- M. Manuguerra, G. Matullo, F. Veglia, H. Autrup, A.M. Dunning, S. Garte, E. Gormally, C. Malaveille, S. Guarrera, S. Polidoro, F. Saletta, M. Peluso, L. Airoldi, K. Overvad, O. Raaschou-Nielsen, F. Clavel-Chapelon, J. Linseisen, H. Boeing, D. Trichopoulos, A. Kalandidi, D. Palli, V. Krogh, R. Tumino, S. Panico, H.B. Bueno-De Mesquita, P.H. Peeters, E. Lund, G. Pera, C. Martinez, P. Amiano, A. Barricarte, M.J. Tormo, J.R. Quiros, G. Berglund, L. Janzon, B. Jarvholm, N.E. Day, N.E. Allen, R. Saracci, R. Kaaks, P. Ferrari, E. Riboli, P. Vineis, Multi-factor dimensionality reduction applied to a large prospective investigation on gene-gene and gene-environment interactions, Carcinogenesis, 28(2) (2007), 414–22.
- T. Martone, P. Vineis, C. Malaveille, B. Terracini, Impact of polymorphisms in xeno(endo)biotic metabolism on pattern and frequency of p53 mutations in bladder cancer., Mutat Res, 462 (2000), 303–9.
- G. Matullo, A.M. Dunning, S. Guarrera, C. Baynes, S. Polidoro, S. Garte, H. Autrup, C. Malaveille, M. Peluso, L. Airoldi, F. Veglia, E. Gormally, G. Hoek, M. Krzyzanowski, K. Overvad, O. Raaschou-Nielsen, F. Clavel-Chapelon, J. Linseisen, H. Boeing, A. Trichopoulou, D. Palli, V. Krogh, R. Tumino, S. Panico, H.B. Bueno-De Mesquita, P.H. Peeters, E. Lund, G. Pera, C. Martinez, M. Dorronsoro, A. Barricarte, M.J. Tormo, J.R. Quiros, N.E. Day, T.J. Key, R. Saracci, R. Kaaks, E. Riboli, P. Vineis, DNA repair polymorphisms and cancer risk in non-smokers in a cohort study, Carcinogenesis, 27(5) (2006), 997–1007.
- Y. Meng, Q. Ma, Y. Yu, J. Farrell, L.A. Farrer, M.A. Wilcox, Multifactor-dimensionality reduction versus family-based association tests in detecting susceptibility loci in discordant sib-pair studies., BMC Genet, 30(6) (2005), S146.
- J. Molitor, M. Papathomas, M Jerrett, and S. Richardson, Bayesian profile regression with an application to the national survey of children’s health., Biostatistics, 11 (2010), 484–498.
- D.S. Moore, G. McCabe, W. Duckworth, S. Sclove, Chapter 18 :bootstrap methods and permutation tests, The Practice of Business Statistics, W.H. Freeman, New York, 2003.
- L. Pachter, B. Sturmfels, Parametric inference for biological sequence analysis, Proc Natl Acad Sci U S A, 101 (2004), 16138–43. Zbl1075.62101
- L. Pachter, B. Sturmfels, Tropical geometry of statistical models, Proc Natl Acad Sci U S A, 101 (2004), 16132–7. Zbl1135.62302
- M. Papathomas, J. Molitor, S. Richardson, E. Riboli, P. Vineis, Examining the joint effect of multiple risk factors using exposure risk profiles : lung cancer in nonsmokers, Environ. Health Perspect, 119 (2011), 84–91.
- L. Patchter, B. Sturmfels, Algebraic statistics for computational biology, Cambridge University Press, 2005.
- M. Peluso, P. Hainaut, L. Airoldi, H. Autrup, A. Dunning, S. Garte, E. Gormally, C. Malaveille, G. Matullo, A. Munnia, E. Riboli, P. Vineis, Methodology of laboratory measurements in prospective studies on gene-environment interactions : the experience of GenAir, Mutat Res, 574 (2005), 92–104.
- G. Pistone, E. Riccomagno, and H.P. Wynn, Algebraic statistics, Chapman and Hall/CRC, Boca Raton, 2001. Zbl0960.62003
- F. Rapallo, Algebraic Markov bases and MCMC for two-way contingency tables, Scandinavian Journal of Statistics, 30 (2003), 385–397. Zbl1055.65018
- F. Rapallo, Algebraic exact inference for rater agreement models, Statistical Methods & Applications, 14 (2005), 45–66. Zbl1089.62136
- E. Riboli, The european prospective investigation into cancer and nutrition (EPIC) : plans and progress., J. Nutr., 131 (2001), no. 1, 170–175.
- T.K. Rice, N.J. Schork, D.C. Rao, Methods for handling multiple testing, Advances in Genetics, 60 (2008), 293–308.
- M.D. Ritchie, L.W. Hahn, N. Roodi, L.R. Bailey, W.D. Dupont, F.F. Parl, J.H. Moore, Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Am. J. Hum. Genet., 69 (2001), no. 1, 138–47.
- J.L. Simon, Resampling : The new statistics (2nd edition), , 1997. URIhttp://bcs.whfreeman.com/pbs/
- B. Sturmfels, Gröbner bases and convex polytopes, American Mathematical Society, 1996. Zbl0856.13020
- B. Sturmfels, Solving systems of polynomial equations, American Mathematical Society, 2002. Zbl1101.13040
- B. Sturmfels, Algebra and geometry of statistical models, Tech. report, John von Neumann Lectures, TU München, 2003.
- B. Sturmfels, S. Sullivant, Toric ideals of phylogenetic invariants, J Comput Biol, 12 (2005), 204–228.
- P. Vineis, L. Airoldi, F. Veglia, L. Olgiati, R. Pastorelli, H. Autrup, A. Dunning, S. Garte, E. Gormally, P. Hainaut, C. Malaveille, G. Matullo, M. Peluso, K. Overvad, A. Tjonneland, F. Clavel-Chapelon, H. Boeing, V. Krogh, D. Palli, S. Panico, R. Tumino, B. Bueno-De Mesquita, P. Peeters, G. Berglund, G. Hallmans, R. Saracci, E. Riboli, Environmental tobacco smoke and risk of respiratory cancer and chronic obstructive pulmonary disease in former smokers and never smokers in the EPIC prospective study., BMJ330 (2005), 277.
- S. Wang, W. Xiong, W. Ma, S. Chanock, W. Jedrychowski, R. Wu, F.P. Perera, Gene-environment interactions on growth trajectories, Genetic Epidemiology (2012), doi : . URI10.1002/gepi.21613
- R.D. Wood, Mammalian nucleotide excision repair proteins and interstrand crosslink repair, Environ Mol Mutagen, 51 (2010), 520–6.
- Y. Zhang, J.S. Liu, Bayesian inference of epistatic interactions in case-control studies., Nature Genet, 39 (2007), 1167–1173.
- Y. Zhang, L.H. Rohde, H. Wu, Involvement of nucleotide excision and mismatch repair mechanisms in double strand break repair, Curr Genomics, 10 (2009), 250–8.

## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.