Compound Poisson approximation of word counts in DNA sequences
Identifying words with unexpected frequencies is an important problem in the analysis of long DNA sequences. To solve it, we need an approximation of the distribution of the number of occurrences of a word . Modeling DNA sequences with m-order Markov chains, we use the Chen-Stein method to obtain Poisson approximations for two different counts. We approximate the “declumped” count of by a Poisson variable and the number of occurrences by a compound Poisson variable. Combinatorial results...
Page 1