# Compound Poisson approximation of word counts in DNA sequences

ESAIM: Probability and Statistics (2010)

- Volume: 1, page 1-16
- ISSN: 1292-8100

## Access Full Article

top## Abstract

top## How to cite

topSchbath, Sophie. "Compound Poisson approximation of word counts in DNA sequences." ESAIM: Probability and Statistics 1 (2010): 1-16. <http://eudml.org/doc/116578>.

@article{Schbath2010,

abstract = {
Identifying words with unexpected frequencies is an important
problem in the analysis of long DNA sequences. To solve it,
we need an approximation of the distribution of the number of
occurrences N(W) of a word W. Modeling DNA sequences with
m-order Markov chains, we use the Chen-Stein method to obtain
Poisson approximations for two different counts. We approximate
the “declumped” count of W by a Poisson variable and the
number of occurrences N(W) by a compound Poisson variable.
Combinatorial results are used to solve the general case of
overlapping words and to calculate the parameters of these
distributions.
},

author = {Schbath, Sophie},

journal = {ESAIM: Probability and Statistics},

keywords = {DNA sequences / word counts / Poisson approximations /
compound Poisson distribution / Chen-Stein method / Markov chains /
word periods.; DNA sequences; word counts; Poisson approximations; compound Poisson distribution; Chen-Stein method; Markov chains; word periods},

language = {eng},

month = {3},

pages = {1-16},

publisher = {EDP Sciences},

title = {Compound Poisson approximation of word counts in DNA sequences},

url = {http://eudml.org/doc/116578},

volume = {1},

year = {2010},

}

TY - JOUR

AU - Schbath, Sophie

TI - Compound Poisson approximation of word counts in DNA sequences

JO - ESAIM: Probability and Statistics

DA - 2010/3//

PB - EDP Sciences

VL - 1

SP - 1

EP - 16

AB -
Identifying words with unexpected frequencies is an important
problem in the analysis of long DNA sequences. To solve it,
we need an approximation of the distribution of the number of
occurrences N(W) of a word W. Modeling DNA sequences with
m-order Markov chains, we use the Chen-Stein method to obtain
Poisson approximations for two different counts. We approximate
the “declumped” count of W by a Poisson variable and the
number of occurrences N(W) by a compound Poisson variable.
Combinatorial results are used to solve the general case of
overlapping words and to calculate the parameters of these
distributions.

LA - eng

KW - DNA sequences / word counts / Poisson approximations /
compound Poisson distribution / Chen-Stein method / Markov chains /
word periods.; DNA sequences; word counts; Poisson approximations; compound Poisson distribution; Chen-Stein method; Markov chains; word periods

UR - http://eudml.org/doc/116578

ER -

## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.