Exponential inequalities for VLMC empirical trees

Antonio Galves; Véronique Maume-Deschamps; Bernard Schmitt

ESAIM: Probability and Statistics (2008)

  • Volume: 12, page 219-229
  • ISSN: 1292-8100

Abstract

top
A seminal paper by Rissanen, published in 1983, introduced the class of Variable Length Markov Chains and the algorithm Context which estimates the probabilistic tree generating the chain. Even if the subject was recently considered in several papers, the central question of the rate of convergence of the algorithm remained open. This is the question we address here. We provide an exponential upper bound for the probability of incorrect estimation of the probabilistic tree, as a function of the size of the sample. As a consequence we prove the almost sure consistency of the algorithm Context. We also derive exponential upper bounds for type I errors and for the probability of underestimation of the context tree. The constants appearing in the bounds are all explicit and obtained in a constructive way.

How to cite

top

Galves, Antonio, Maume-Deschamps, Véronique, and Schmitt, Bernard. "Exponential inequalities for VLMC empirical trees." ESAIM: Probability and Statistics 12 (2008): 219-229. <http://eudml.org/doc/250385>.

@article{Galves2008,
abstract = { A seminal paper by Rissanen, published in 1983, introduced the class of Variable Length Markov Chains and the algorithm Context which estimates the probabilistic tree generating the chain. Even if the subject was recently considered in several papers, the central question of the rate of convergence of the algorithm remained open. This is the question we address here. We provide an exponential upper bound for the probability of incorrect estimation of the probabilistic tree, as a function of the size of the sample. As a consequence we prove the almost sure consistency of the algorithm Context. We also derive exponential upper bounds for type I errors and for the probability of underestimation of the context tree. The constants appearing in the bounds are all explicit and obtained in a constructive way. },
author = {Galves, Antonio, Maume-Deschamps, Véronique, Schmitt, Bernard},
journal = {ESAIM: Probability and Statistics},
keywords = {Variable Length Markov Chain; context tree; algorithm context; weak dependance; variable length Markov chain},
language = {eng},
month = {1},
pages = {219-229},
publisher = {EDP Sciences},
title = {Exponential inequalities for VLMC empirical trees},
url = {http://eudml.org/doc/250385},
volume = {12},
year = {2008},
}

TY - JOUR
AU - Galves, Antonio
AU - Maume-Deschamps, Véronique
AU - Schmitt, Bernard
TI - Exponential inequalities for VLMC empirical trees
JO - ESAIM: Probability and Statistics
DA - 2008/1//
PB - EDP Sciences
VL - 12
SP - 219
EP - 229
AB - A seminal paper by Rissanen, published in 1983, introduced the class of Variable Length Markov Chains and the algorithm Context which estimates the probabilistic tree generating the chain. Even if the subject was recently considered in several papers, the central question of the rate of convergence of the algorithm remained open. This is the question we address here. We provide an exponential upper bound for the probability of incorrect estimation of the probabilistic tree, as a function of the size of the sample. As a consequence we prove the almost sure consistency of the algorithm Context. We also derive exponential upper bounds for type I errors and for the probability of underestimation of the context tree. The constants appearing in the bounds are all explicit and obtained in a constructive way.
LA - eng
KW - Variable Length Markov Chain; context tree; algorithm context; weak dependance; variable length Markov chain
UR - http://eudml.org/doc/250385
ER -

References

top
  1. G. Bejerano and G. Yona, Variations on probabilistic suffix trees: statistical modeling and prediction of protein families. Bioinformatics17 (2001) 23–43.  
  2. P. Bühlmann and A. Wyner, Variable length Markov chains. Ann. Statist.27 (1999) 480–513.  
  3. I. Csiszár, Large-scale typicality of Markov sample paths and consistency of MDL order estimators. Special issue on Shannon theory: perspective, trends, and applications. IEEE Trans. Inform. Theory48 (2002) 1616–1628.  
  4. I. Csiszár and Z. Talata, Context tree estimation for not necessarily finite memory processes via BIC and MDL, manuscript (2005).  
  5. J. Dedecker and P. Doukhan, A new covariance inequality and applications. Stochastic Process. Appl.106 (2003) 63–80.  
  6. J. Dedecker and C. Prieur, New dependence coefficients. Examples and applications to statistics. Prob. Theory Relat. Fields132 (2005) 203–236.  
  7. P. Ferrari and A. Galves, Coupling and regeneration for stochastic processes. Notes for a minicourse presented in XIII Escuela Venezolana de Matematicas. Can be downloaded from www.ime.usp.br/~pablo/book/abstract.html (2000).  
  8. F. Ferrari and A. Wyner, Estimation of general stationary processes by variable length Markov chains. Scand. J. Statist.30 (2003) 459–480.  
  9. F. Leonardi and A. Galves, Sequence Motif identification and protein classification using probabilistic trees. Lect. Notes Comput. Sci.3594 (2005) 190–193.  
  10. V. Maume-Deschamps, Exponential inequalities and estimation of conditional probabilities in Dependence in probability and statistics, Lect. Notes in Stat., Vol. 187, P. Bertail, P. Doukhan and P. Soulier Eds. Springer (2006).  
  11. J. Rissanen, A universal data compression system. IEEE Trans. Inform. Theory29 (1983) 656–664.  
  12. T.J. Tjalkens and F.M.J.F. Willems, Implementing the context-tree weighting method: arithmetic coding. Recent advances in interdisciplinary mathematics (Portland, ME, 1997). J. Combin. Inform. System Sci.25 (2000) 49-58.  
  13. F.M. Willems, Y.M. Shtarkov and T.J Tjalkens, The context-tree weighting method: basic properties. IEEE Trans. Inform. Theory41 (1995) 653–664.  

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.