Sampling properties of estimators of nucleotide diversity at discovered SNP sites

Alexander Renwick; Penelope Bonnen; Dimitra Trikka; David Nelson; Ranajit Chakraborty; Marek Kimmel

International Journal of Applied Mathematics and Computer Science (2003)

  • Volume: 13, Issue: 3, page 385-394
  • ISSN: 1641-876X

Abstract

top
SNP sites are generally discovered by sequencing regions of the human genome in a limited number of individuals. This may leave SNP sites present in the region, but containing rare mutant nucleotides, undetected. Consequently, estimates of nucleotide diversity obtained from assays of detected SNP sites are biased. In this research we present a statistical model of the SNP discovery process, which is used to evaluate the extent of this bias. This model involves the symmetric Beta distribution of variant frequencies at SNP sites, with an additional probability that there is no SNP at any given site. Under this model of allele frequency distributions at SNP sites, we show that nucleotide diversity is always underestimated. However, the extent of bias does not seem to exceed 10-15% for the analyzed data. We find that our model of allele frequency distributions at SNP sites is consistent with SNP statistics derived based on new SNP data at ATM, BLM, RQL and WRN gene regions. The application of the theory to these new SNP data as well as to the literature data at the LPL gene region indicates that in spite of ascertainment biases, the observed differences of nucleotide diversity across these gene regions are real. This provides interesting evidence concerning the heterogeneity of the rates of nucleotide substitution across the genome.

How to cite

top

Renwick, Alexander, et al. "Sampling properties of estimators of nucleotide diversity at discovered SNP sites." International Journal of Applied Mathematics and Computer Science 13.3 (2003): 385-394. <http://eudml.org/doc/207652>.

@article{Renwick2003,
abstract = {SNP sites are generally discovered by sequencing regions of the human genome in a limited number of individuals. This may leave SNP sites present in the region, but containing rare mutant nucleotides, undetected. Consequently, estimates of nucleotide diversity obtained from assays of detected SNP sites are biased. In this research we present a statistical model of the SNP discovery process, which is used to evaluate the extent of this bias. This model involves the symmetric Beta distribution of variant frequencies at SNP sites, with an additional probability that there is no SNP at any given site. Under this model of allele frequency distributions at SNP sites, we show that nucleotide diversity is always underestimated. However, the extent of bias does not seem to exceed 10-15% for the analyzed data. We find that our model of allele frequency distributions at SNP sites is consistent with SNP statistics derived based on new SNP data at ATM, BLM, RQL and WRN gene regions. The application of the theory to these new SNP data as well as to the literature data at the LPL gene region indicates that in spite of ascertainment biases, the observed differences of nucleotide diversity across these gene regions are real. This provides interesting evidence concerning the heterogeneity of the rates of nucleotide substitution across the genome.},
author = {Renwick, Alexander, Bonnen, Penelope, Trikka, Dimitra, Nelson, David, Chakraborty, Ranajit, Kimmel, Marek},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {nucleotide diversity; molecular evolution; single nucleotide polymorphisms; ascertainment bias},
language = {eng},
number = {3},
pages = {385-394},
title = {Sampling properties of estimators of nucleotide diversity at discovered SNP sites},
url = {http://eudml.org/doc/207652},
volume = {13},
year = {2003},
}

TY - JOUR
AU - Renwick, Alexander
AU - Bonnen, Penelope
AU - Trikka, Dimitra
AU - Nelson, David
AU - Chakraborty, Ranajit
AU - Kimmel, Marek
TI - Sampling properties of estimators of nucleotide diversity at discovered SNP sites
JO - International Journal of Applied Mathematics and Computer Science
PY - 2003
VL - 13
IS - 3
SP - 385
EP - 394
AB - SNP sites are generally discovered by sequencing regions of the human genome in a limited number of individuals. This may leave SNP sites present in the region, but containing rare mutant nucleotides, undetected. Consequently, estimates of nucleotide diversity obtained from assays of detected SNP sites are biased. In this research we present a statistical model of the SNP discovery process, which is used to evaluate the extent of this bias. This model involves the symmetric Beta distribution of variant frequencies at SNP sites, with an additional probability that there is no SNP at any given site. Under this model of allele frequency distributions at SNP sites, we show that nucleotide diversity is always underestimated. However, the extent of bias does not seem to exceed 10-15% for the analyzed data. We find that our model of allele frequency distributions at SNP sites is consistent with SNP statistics derived based on new SNP data at ATM, BLM, RQL and WRN gene regions. The application of the theory to these new SNP data as well as to the literature data at the LPL gene region indicates that in spite of ascertainment biases, the observed differences of nucleotide diversity across these gene regions are real. This provides interesting evidence concerning the heterogeneity of the rates of nucleotide substitution across the genome.
LA - eng
KW - nucleotide diversity; molecular evolution; single nucleotide polymorphisms; ascertainment bias
UR - http://eudml.org/doc/207652
ER -

References

top
  1. Bonnen P.E., Story M.D., Ashorn C.L., Buchholz T.A., Weil M.A. and Nelson D. (2000): Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium. - Am. J. Hum. Genet., Vol. 67, No. 6, pp. 1437-1451. 
  2. Cargill M., Altshuler D., Ireland J., Sklar P., Ardlie K., Patil N., Shaw N., Lane C.R., Lim E.P., Kalyanaraman N., Nemesh J., Ziaugra L., Friedland L., Rolfe A., Warrington J., Lipshutz R., Daley G.Q. and Lander E.S. (1999): Characterization of single-nucleotide polymorphisms incoding regions of human genes. - Nat. Genet., Vol. 22, No. 3, pp. 231-238. 
  3. Chakraborty R. and Rao C.R. (2000): Selection biases of samples and their resolution, In: Handbook of Statistics (C.R. Rao, P.K. Sen, Eds.). -Amsterdam: Elsevier Science. 
  4. Clark A.G., Weiss K.M., Nickerson D.A., Taylor S.L., Buchanan A., Stengard J., Salomaa V., Vartiainen E., Perola M., Boerwinkle E., Sing C.F. (1998): Haplotype structure and population genetic inferences from nucleotide-sequence variation in human lipoprotein lipase.- Am. J. Hum. Genet., Vol. 63, No. 2, pp. 595-612. 
  5. Eberle M. and Kruglyak L. (2000): An analysis of strategies for discovery of single-nucleotide polymorphisms. - Genet. Epidem., Vol. 19, No. S1, pp. S29-S35. 
  6. Ewens W.J. (1979): Mathematical Population Genetics. Biomathematics, Vol. 9. - Berlin: Springer. Zbl0422.92011
  7. Halushka M.K., Fan J.B., Bentley K., Hsie L., Shen N., Weder A., Cooper R., LipshutzR. and Chakravarti A. (1999): Patterns of single-nucleotide polymorphisms in candidategenes for blood-pressure homeostasis. - Nat. Genet., Vol. 22, No. 3, pp. 239-247. 
  8. Li W.-H. (1997): Molecular Evolution. - Sunderland, MA: Sinauer Associates. 
  9. Nickerson D.A., Taylor S.L., Weiss K.M., Clark A.G., Hutchinson R.G., Stengard J., Salomaa V., Vartiainen E., Boerwinkle E., Sing C.F. (1998): DNA sequence diversity in a 9.7-kb region of the human lipoprotein lipase gene. - Nat. Genet., Vol. 19, No. 3, pp. 233-240. 
  10. Rogers A.R., Jorde L.B. (1996): Ascertainment bias in estimates of average heterozygosity. - Am. J. Hum. Genet., Vol. 58, No. 5, pp. 1033-1041. 
  11. Trikka D., Fang Z., Renwick A., Jones S.H., Chakraborty R., Kimmel M., Nelson D.L.(2002): Complex SNP-based haplotypes in three human helicases demonstrate the need for ethnically-matched control populations in association studies.- Genome Res, Vol. 12, No. 4, pp. 627-639. 
  12. Venter J.C. et al. (2001): The sequence of the human genome.- Science, Vol. 291, No. 5507, pp. 1304-1351. 
  13. Wang D.G., Fan J.B., Siao C.J., Berno A., Young P., Sapolsky R., Ghandour G., Perkins N., Winchester E., Spencer J., Kruglyak L., Stein L., Hsie L., Topaloglou T., Hubbell E., Robinson E., Mittmann M., Morris M.S., Shen N., Kilburn D., Rioux J., Nusbaum C., Rozen S., Hudson T.J., Lander E.S. et al. (1998): Large-scale identification, mapping, and genotypingof single-nucleotide polymorphisms in the human genome. - Science, Vol. 280, No. 5366, pp. 1077-1082. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.