Segmentation of the Poisson and negative binomial rate models: a penalized estimator

Alice Cleynen; Emilie Lebarbier

ESAIM: Probability and Statistics (2014)

  • Volume: 18, page 750-769
  • ISSN: 1292-8100

Abstract

top
We consider the segmentation problem of Poisson and negative binomial (i.e. overdispersed Poisson) rate distributions. In segmentation, an important issue remains the choice of the number of segments. To this end, we propose a penalized -likelihood estimator where the penalty function is constructed in a non-asymptotic context following the works of L. Birgé and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. The performances of our criterion is assessed using simulated and real datasets in the RNA-seq data analysis context.

How to cite

top

Cleynen, Alice, and Lebarbier, Emilie. "Segmentation of the Poisson and negative binomial rate models: a penalized estimator." ESAIM: Probability and Statistics 18 (2014): 750-769. <http://eudml.org/doc/274386>.

@article{Cleynen2014,
abstract = {We consider the segmentation problem of Poisson and negative binomial (i.e. overdispersed Poisson) rate distributions. In segmentation, an important issue remains the choice of the number of segments. To this end, we propose a penalized -likelihood estimator where the penalty function is constructed in a non-asymptotic context following the works of L. Birgé and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. The performances of our criterion is assessed using simulated and real datasets in the RNA-seq data analysis context.},
author = {Cleynen, Alice, Lebarbier, Emilie},
journal = {ESAIM: Probability and Statistics},
keywords = {distribution estimation; change-point detection; count data (RNA-seq); poisson and negative binomial distributions; model selection; Poisson and negative binomial distributions},
language = {eng},
pages = {750-769},
publisher = {EDP-Sciences},
title = {Segmentation of the Poisson and negative binomial rate models: a penalized estimator},
url = {http://eudml.org/doc/274386},
volume = {18},
year = {2014},
}

TY - JOUR
AU - Cleynen, Alice
AU - Lebarbier, Emilie
TI - Segmentation of the Poisson and negative binomial rate models: a penalized estimator
JO - ESAIM: Probability and Statistics
PY - 2014
PB - EDP-Sciences
VL - 18
SP - 750
EP - 769
AB - We consider the segmentation problem of Poisson and negative binomial (i.e. overdispersed Poisson) rate distributions. In segmentation, an important issue remains the choice of the number of segments. To this end, we propose a penalized -likelihood estimator where the penalty function is constructed in a non-asymptotic context following the works of L. Birgé and P. Massart. The resulting estimator is proved to satisfy an oracle inequality. The performances of our criterion is assessed using simulated and real datasets in the RNA-seq data analysis context.
LA - eng
KW - distribution estimation; change-point detection; count data (RNA-seq); poisson and negative binomial distributions; model selection; Poisson and negative binomial distributions
UR - http://eudml.org/doc/274386
ER -

References

top
  1. [1] H. Akaike, Information Theory and Extension of the Maximum Likelihood Principle. Second int. Symp. Inf. Theory (1973) 267–281. Zbl0283.62006MR483125
  2. [2] N. Akakpo, Estimating a discrete distribution via histogram selection. ESAIM: PS 15 (2011) 1–29. Zbl06157505MR2793047
  3. [3] S. Arlot and P. Massart, Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res.10 (2009) 245–279. 
  4. [4] Y. Baraud and L. Birgé, Estimating the intensity of a random measure by histogram type estimators. Probab. Theory Relat. Fields (2009) 143 239–284. Zbl1149.62019MR2449129
  5. [5] A. Barron, L. Birgé and P. Massart, Risk bounds for model selection via penalization. Probab. Theory Relat. Fields113 (1999) 301–413. Zbl0946.62036MR1679028
  6. [6] C. Biernacki, G. Celeux, G. Govaert, Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence22 (2000) 719–725. 
  7. [7] L. Birgé, Model selection for Poisson processes. In Asymptotics: particles, processes and inverse problems, Vol. 55 of IMS Lect. Notes Monogr. Ser.. Beachwood, OH: Inst. Math. Statist. (2007) 32–64. Zbl1176.62082MR2459930
  8. [8] L. Birgé and P. Massart, From model selection to adaptive estimation, in Festschrift for Lucien Le Cam. New York, Springer (1997) 55–87. Zbl0920.62042MR1462939
  9. [9] L. Birgé and P. Massart, Gaussian model selection. J. Eur. Math. Soc.3 (2001) 203–268. Zbl1037.62001MR1848946
  10. [10] L. Birgé and P. Massart, Minimal penalties for Gaussian model selection. Probab. Theory Relat. Fields (2007) 138 33–73. Zbl1112.62082MR2288064
  11. [11] J.V. Braun, R. Braun and H.G. Müller, Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika87 (2000) 301–314. Zbl0963.62067MR1782480
  12. [12] J.V. Braun, H.G. Muller, Statistical methods for DNA sequence segmentation. Stat. Sci. (1998) 142–162. Zbl0960.62121
  13. [13] Breiman, Friedman, Olshen, Stone: Classification and Regression Trees. Wadsworth and Brooks (1984). Zbl0541.62042
  14. [14] G. Castellan, Modified Akaikes criterion for histogram density estimation. Technical Report#9961 (1999). 
  15. [15] A. Cleynen, M. Koskas, E. Lebarbier, G. Rigaill and S. Robin, Segmentor3IsBack, an R package for the fast and exact segmentation of Seq-data. Algorithms for Molecular Biology (2014) 
  16. [16] N. Johnson, A. Kemp and S. Kotz, Univariate Discrete Distributions. John Wiley & Sons, Inc. (2005). Zbl1092.62010MR2163227
  17. [17] R. Killick and I.A. Eckley, Changepoint: an R package for changepoint analysis. Lancaster University (2011). 
  18. [18] E. Lebarbier, Detecting multiple change-points in the mean of Gaussian process by model selection. Signal Process.85 (2005) 717–736. Zbl1148.94403
  19. [19] T.M. Luong, Y. Rozenholc and G. Nuel, Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model. Comput. Stat. Data Anal. (2013). MR3103767
  20. [20] P. Massart, Concentration inequalities and model selection. In Lect. Notes Math. Springer Berlin/Heidelberg (2007). Zbl1170.60006MR2319879
  21. [21] P. Reynaud-Bouret, Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities. Probab. Theory Relat. Fields126 (2003) 103–153. Zbl1019.62079MR1981635
  22. [22] G. Rigaill, Pruned dynamic programming for optimal multiple change-point detection. ArXiv:1004.0887 2010, [http://arxiv.org/abs/1004.0887]. 
  23. [23] G. Rigaill, E. Lebarbier and S. Robin, Exact posterior distributions and model selection criteria for multiple change-point detection problems. Stat. Comput.22 (2012) 917–929. Zbl1252.62027MR2913792
  24. [24] D. Risso, K. Schwartz, G. Sherlock and S. Dudoit, GC-Content Normalization for RNA-Seq Data. BMC Bioinform. 12 (2011) 480. 
  25. [25] Y.C. Yao, Estimating the number of change-points via Schwarz’ criterion. Stat. Probab. Lett.6 (1988) 181–189. Zbl0642.62016MR919373
  26. [26] N.R. Zhang and D.O. Siegmund, A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics63 (2007) 22–32. Zbl1206.62174MR2345571

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.