# The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes

International Journal of Applied Mathematics and Computer Science (2003)

- Volume: 13, Issue: 3, page 419-429
- ISSN: 1641-876X

## Access Full Article

top## Abstract

top## How to cite

topPolańska, Joanna. "The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes." International Journal of Applied Mathematics and Computer Science 13.3 (2003): 419-429. <http://eudml.org/doc/207655>.

@article{Polańska2003,

abstract = {A haplotype analysis is becoming increasingly important in studying complex genetic diseases. Various algorithms and specialized computer software have been developed to statistically estimate haplotype frequencies from marker phenotypes in unrelated individuals. However, currently there are very few empirical reports on the performance of the methods for the recovery of haplotype frequencies. One of the most widely used methods of haplotype reconstruction is the Maximum Likelihood method, employing the Expectation-Maximization (EM) algorithm. The aim of this study is to explore the variability of the EM estimates of the haplotype frequency for real data. We analyzed haplotypes at the BLM, WRN, RECQL and ATM genes with 8-14 biallelic markers per gene in 300 individuals. We also re-analyzed the data presented by Mano et al. (2002). We studied the convergence speed, the shape of the loglikelihood hypersurface, and the existence of local maxima, as well as their relations with heterozygosity, the linkage disequilibrium and departures from the Hardy-Weinberg equilibrium. Our study contributes to determining practical values for algorithm sensitivities.},

author = {Polańska, Joanna},

journal = {International Journal of Applied Mathematics and Computer Science},

keywords = {gene frequency; algorithms; likelihood functions; haplotypes},

language = {eng},

number = {3},

pages = {419-429},

title = {The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes},

url = {http://eudml.org/doc/207655},

volume = {13},

year = {2003},

}

TY - JOUR

AU - Polańska, Joanna

TI - The EM algorithm and its implementation for the estimation of frequencies of SNP-haplotypes

JO - International Journal of Applied Mathematics and Computer Science

PY - 2003

VL - 13

IS - 3

SP - 419

EP - 429

AB - A haplotype analysis is becoming increasingly important in studying complex genetic diseases. Various algorithms and specialized computer software have been developed to statistically estimate haplotype frequencies from marker phenotypes in unrelated individuals. However, currently there are very few empirical reports on the performance of the methods for the recovery of haplotype frequencies. One of the most widely used methods of haplotype reconstruction is the Maximum Likelihood method, employing the Expectation-Maximization (EM) algorithm. The aim of this study is to explore the variability of the EM estimates of the haplotype frequency for real data. We analyzed haplotypes at the BLM, WRN, RECQL and ATM genes with 8-14 biallelic markers per gene in 300 individuals. We also re-analyzed the data presented by Mano et al. (2002). We studied the convergence speed, the shape of the loglikelihood hypersurface, and the existence of local maxima, as well as their relations with heterozygosity, the linkage disequilibrium and departures from the Hardy-Weinberg equilibrium. Our study contributes to determining practical values for algorithm sensitivities.

LA - eng

KW - gene frequency; algorithms; likelihood functions; haplotypes

UR - http://eudml.org/doc/207655

ER -

## References

top- Bonnen P.E., Story M.D., Ashorn C.L., Buchholz T.A., Weil M.M. and Nelson D.L. (2000): Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium. — Am. J. Hum. Genet., Vol. 67, No. 6, pp. 1437–1451.
- Chiano M.N. and Clayton D.G. (1998): Fine genetic mapping using haplotype analysis and the missing data problem. — Ann. Hum. Genet., Vol. 62, Pt. 1, pp. 55–60.
- Clark A.G. (1990): Inference of haplotypes from PCR-amplified samples of diploid populations. — Mol. Biol. Evol., Vol. 7, No. 2, pp. 111–122.
- Clark V.J., Metheny N., Dean M. and Peterson R.J. (2001): Statistical estimation and pedigree analysis of CCR2-CCR5 haplotypes. — Hum. Genet., Vol. 108, No. 6, pp. 484–493.
- Dempster A.P., Laird N.M. and Rubin D.B. (1977): Maximum likelihood from incomplete data via the EM algorithm. — J. R. Stat. Soc., Vol. 39, No. 1, pp. 1–38. Zbl0364.62022
- Excoffier L. and Slatkin M (1995): Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. — Mol. Biol. Evol., Vol. 12, No. 5, pp. 921–927.
- Fallin D. and Schork N.J. (2000): Accuracy of haplotype frequency estimation for biallelic loci, via the Expectation- Maximization algorithm for unphased diploid genotype data. — Am. J. Hum. Genet., Vol. 67, No. 4, pp. 947–959.
- Ghosh S. and Majumder P.P. (2000): Mapping a quantitative trait locus via the EM algorithm and Bayesian classification. — Genet. Epidemiol., Vol. 19, No. 2, pp. 97–126.
- Hawley M.E. and Kidd K.K. (1995): HAPLO: A program using the EM algorithm to estimate the frequencies of multi-site haplotypes. — J. Heredity, Vol. 86, No. 5, pp. 409–411.
- Hudson R.R. and Kaplan N.L. (1985): Statistical properties of the number of recombination events in the history of a sample of DNA sequence. — Genetics, Vol. 111, No. 1, pp. 147–164.
- Kalinowski S.T. and Hedrick P.W. (2001): Estimation of linkage disequilibrium for loci with multiple alleles: Basic approach and an application using data from boghorn sheep. — Heredity, Vol. 87, Pt. 6, pp. 698–708.
- Lin S., Cutler D.J., Zwick M.E. and Chakravarti A. (2002): Haplotype inference in random population samples. — Am. J. Hum. Genet., Vol. 71, No. 5, pp. 1129–1137.
- Long J.C., Williams R.C. and Urbanek M. (1995): An E-M algorithm and testing strategy for multiple-locus haplotypes. — Am. J. Hum. Genet., Vol. 56, No. 3, pp. 799–810.
- Mano S., Yasuda N., Tamiya G., Inoko H., Gojobori T. and Imanishi T. (2002): Phase space structure if haplotype frequency estimation by the EM algorithm. — Proc. Waterfront Symp. Human Genome ScienceWASH 2002, Tokyo, Japan.
- McKeigue P.M. (2000): Efficiency of estimation of haplotype frequencies: Use of marker phenotypes of unrelated individuals versus counting of phase-known gametes. — Am. J. Hum. Genet., Vol. 67, No. 6, pp. 1626–1627.
- McLachlan G.J. and Thriyambakam K. (1997): The EM algorithm and extensions. — New York: Wiley. Zbl0882.62012
- Meng X. and van Dyke D. (1977): The EM algorithm — An old folk-song sung to a fast new tune. — J. R. Statist. Soc. B, Vol. 59, No. 3, pp. 511–567. Zbl1090.62518
- Niu T., Qin Z.S., Xu X. and Liu J.S. (2002): Bayesian haplotype inference for multiple linked Single-Nucleotide Polymorphisms. — Am. J. Hum. Genet., Vol. 70, No. 1, pp. 157– 169.
- Patil N., Berno A.J., Hinds D.A., Barrett W.A., Doshi J.M., Hacker C.R., Kautzer C.R., Lee D.H. Marjoribanks C., McDonough D.P., et al. (2001): Blocks of limited halplotype diversity revealed by high-resolution scanning of human chromosome 21. — Science, Vol. 294, No. 5547, pp. 1719–1723.
- Qin Z.S., Niu T. and Liu J.S. (2002):Partition-Ligation- Expectation-Maximization algorithm for haplotype inference with Single-Nucleotide Polymorphism. — Am. J. Hum. Genet., Vol. 71, No. 5, pp. 1242–1247.
- Rohde K. and Fuerst R. (2001): Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information. — Hum. Mutat., Vol. 17, No. 4, pp. 289–295.
- Schneider S., Roessli D. and Excoffier L. (2000): Arlequin 2.001: A software for population genetics data analysis. — Genetics and Biometry Laboratory, University of Geneva, Switzerland.
- Single R.M., Meyer D., Hollenbach J.A., Nelson M.P., Noble J.A., Erlich H.A. and Thomson G. (2002): Haplotype frequency estimation in patient populations: the effect of departures from Hardy Weinberg proportions and collapsing over a locus in the HLA region. — Genet. Epidemiol., Vol. 22, No. 2, pp. 186–195.
- Slatkin M. and Excoffier L. (1996): Testing for linkage disequilibrium in genotypic data using the Expectation- Maximization algorithm. — Heredity, Vol. 76, Pt. 4, pp. 377–383.
- Stephens M., Smith N.J. and Donnelly P. (2001a): A new statistical method for haplotype reconstruction from population data. — Am. J. Hum. Genet., Vol. 68, No. 4, pp. 978–989.
- Stephens M., Smith N.J. and Donnelly P. (2001b): Reply to Zhang et al. — Am. J. Hum. Genet., Vol. 69, No. 4, pp. 912–914.
- Tishkoff S.A., Pakstis A.J., Ruano G. and Kidd K.K. (2000): The accuracy of statistical methods for estimation of haplotype frequencies: An example from the CD4 locus. — Am. J. Hum. Genet., Vol. 67, No. 2, pp. 518–522.
- Trikka D., Fang Z., Renwick A., Jones S.H., Chakraborty R., Kimmel M. and Nelson D.L. (2002): Complex SNP-based haplotypes in three human helicases: implication for cancer association studies. — Genome Res., Vol. 12, No. 4, pp. 627–639.
- Wang N., Akey J.M., Zhang K., Chakraborty R. and Jin L. (2002): Distribution of recombination crossovers and the origin of haplotype blocks: The interplay of population history, recombination, and mutation. — Am. J. Hum. Genet., Vol. 71, No. 5, pp. 1227–1234.
- Wu C.F.J. (1983): On the convergence properties of the EM algorithm. — Ann. Stat., Vol. 11, No. 1, pp. 95–103. Zbl0517.62035
- Xu C.F., Lewis K., Cantone K.L., Khan P., Donnelly C., White N., Crocker N., Boyd P.R., Zaykin D.V. and Purvis I.J. (2002): Effectivness of computational methods in haplotype prediction. — Hum. Genet., Vol. 110, No. 2, pp. 148– 156.
- Zhang S., Pakstis A.J., Kidd K.K. and Zhao H. (2001): Comparision of two methods for haplotype reconstruction and haplotype frequency estimation from population data. — Am. J. Hum. Genet., Vol. 69, No. 4, pp. 906–912.

## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.