Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation

Van Hanh Nguyen; Catherine Matias

ESAIM: Probability and Statistics (2014)

  • Volume: 18, page 584-612
  • ISSN: 1292-8100

Abstract

top
In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of p-values under the null hypothesis and the other component f is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of the unknown proportion θ of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Hölder densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data.

How to cite

top

Nguyen, Van Hanh, and Matias, Catherine. "Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation." ESAIM: Probability and Statistics 18 (2014): 584-612. <http://eudml.org/doc/274372>.

@article{Nguyen2014,
abstract = {In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of p-values under the null hypothesis and the other component f is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of the unknown proportion θ of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Hölder densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data.},
author = {Nguyen, Van Hanh, Matias, Catherine},
journal = {ESAIM: Probability and Statistics},
keywords = {false discovery rate; kernel estimation; local false discovery rate; maximum smoothed likelihood; multiple testing; p-values; semiparametric mixture model; -values},
language = {eng},
pages = {584-612},
publisher = {EDP-Sciences},
title = {Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation},
url = {http://eudml.org/doc/274372},
volume = {18},
year = {2014},
}

TY - JOUR
AU - Nguyen, Van Hanh
AU - Matias, Catherine
TI - Nonparametric estimation of the density of the alternative hypothesis in a multiple testing setup. Application to local false discovery rate estimation
JO - ESAIM: Probability and Statistics
PY - 2014
PB - EDP-Sciences
VL - 18
SP - 584
EP - 612
AB - In a multiple testing context, we consider a semiparametric mixture model with two components where one component is known and corresponds to the distribution of p-values under the null hypothesis and the other component f is nonparametric and stands for the distribution under the alternative hypothesis. Motivated by the issue of local false discovery rate estimation, we focus here on the estimation of the nonparametric unknown component f in the mixture, relying on a preliminary estimator of the unknown proportion θ of true null hypotheses. We propose and study the asymptotic properties of two different estimators for this unknown component. The first estimator is a randomly weighted kernel estimator. We establish an upper bound for its pointwise quadratic risk, exhibiting the classical nonparametric rate of convergence over a class of Hölder densities. To our knowledge, this is the first result establishing convergence as well as corresponding rate for the estimation of the unknown component in this nonparametric mixture. The second estimator is a maximum smoothed likelihood estimator. It is computed through an iterative algorithm, for which we establish a descent property. In addition, these estimators are used in a multiple testing procedure in order to estimate the local false discovery rate. Their respective performances are then compared on synthetic data.
LA - eng
KW - false discovery rate; kernel estimation; local false discovery rate; maximum smoothed likelihood; multiple testing; p-values; semiparametric mixture model; -values
UR - http://eudml.org/doc/274372
ER -

References

top
  1. [1] D.B. Allison, G.L. Gadbury, M. Heo, J.R. Fernández, C.-K. Lee, T.A. Prolla and R. Weindruch, A mixture model approach for the analysis of microarray gene expression data. Comput. Stat. Data Anal.39 (2002) 1–20. Zbl1119.62371MR1895555
  2. [2] J. Aubert, A. Bar-Hen, J.-J. Daudin and S. Robin, Determination of the differentially expressed genes in microarray experiments using local fdr. BMC Bioinformatics 5 (2004) 125. 
  3. [3] Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B57 (1995) 289–300. Zbl0809.62014MR1325392
  4. [4] A. Celisse, and S. Robin, A cross-validation based estimation of the proportion of true null hypotheses. J. Statist. Plann. Inference140 (2010) 3132–3147. Zbl1204.62127MR2659843
  5. [5] A.P. Dempster, N.M. Laird and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B39 (1977) 1–38. Zbl0364.62022MR501537
  6. [6] B. Efron, R. Tibshirani, J.D. Storey and V. Tusher, Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc.96 (2001) 1151–1160. Zbl1073.62511MR1946571
  7. [7] P. Eggermont and V. LaRiccia, Maximum smoothed likelihood density estimation for inverse problems. Ann. Statist.23 (1995) 199–220. Zbl0822.62025MR1331664
  8. [8] P. Eggermont and V. LaRiccia, Maximum penalized likelihood estimation. Vol. 1: Density estimation. Springer Ser. Statist. Springer, New York (2001). Zbl0984.62026MR1837879
  9. [9] P.P.B. Eggermont, Nonlinear smoothing and the EM algorithm for positive integral equations of the first kind. Appl. Math. Optim.39 (1999) 75–91. Zbl0969.65122MR1654562
  10. [10] M. Guedj, S. Robin, A. Celisse and G. Nuel, Kerfdr: a semi-parametric kernel-based approach to local false discovery rate estimation. BMC Bioinformatics 10 (2009) 84. 
  11. [11] M. Langaas, B.H. Lindqvist and E. Ferkingstad, Estimating the proportion of true null hypotheses, with application to DNA microarray data. J.R. Stat. Soc. Ser. B Stat. Methodol.67 (2005) 555–572. Zbl1095.62037MR2168204
  12. [12] M. Levine, D.R. Hunter and D. Chauveau, Maximum smoothed likelihood for multivariate mixtures. Biometrika98 (2011) 403–416. Zbl1215.62055MR2806437
  13. [13] J. Liao, Y. Lin, Z.E. Selvanayagam and W.J. Shih, A mixture model for estimating the local false discovery rate in DNA microarray analysis. Bioinformatics20 (2004) 2694–2701. 
  14. [14] G. McLachlan, R. Bean and L.B.-T. Jones, A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics22 (2006) 1608–1615. 
  15. [15] P. Neuvial, Intrinsic bounds and false discovery rate control in multiple testing problems. Technical report (2010). arXiv:1003.0747. 
  16. [16] V. Nguyen and C. Matias, On efficient estimators of the proportion of true null hypotheses in a multiple testing setup. Technical report (2012). Preprint arXiv:1205.4097. Zbl1305.62272MR3277044
  17. [17] S. Pounds and S.W. Morris, Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of p-values. Bioinformatics19 (2003) 1236–1242. 
  18. [18] S. Robin, A. Bar-Hen, J.-J. Daudin and L. Pierre, A semi-parametric approach for mixture models: application to local false discovery rate estimation. Comput. Statist. Data Anal.51 (2007) 5483–5493. Zbl05560048MR2407654
  19. [19] T. Schweder, and E. Spjøtvoll, Plots of p-values to evaluate many tests simultaneously. Biometrika69 (1982) 493–502. 
  20. [20] B.W. Silverman, Density estimation for statistics and data analysis. Monogr. Statist. Appl. Prob. Chapman & Hall, London (1986). Zbl0617.62042MR848134
  21. [21] J.D. Storey, A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol.64 (2002) 479–498. Zbl1090.62073MR1924302
  22. [22] J.D. Storey, The positive false discovery rate: a Bayesian interpretation and the q-value. Ann. Statist.31 (2003) 2013–2035. Zbl1042.62026MR2036398
  23. [23] K. Strimmer, A unified approach to false discovery rate estimation. BMC Bioinformatics 9 (2008) 303. 
  24. [24] W. Sun and T. Cai, Oracle and adaptive compound decision rules for false discovery rate control. J. Am. Stat. Assoc.102 (2007) 901–912. Zbl05564419MR2411657
  25. [25] W. Sun and T. Cai, Large-scale multiple testing under dependence. J. Royal Stat. Soc. Series B (Statistical Methodology) 71 (2009) 393–424. Zbl1248.62005MR2649603
  26. [26] A.B. Tsybakov, Introduction to nonparametric estimation. Springer Ser. Statist. Springer, New York (2009). Zbl1029.62034MR2724359
  27. [27] D. Wied and R. Weißbach, Consistency of the kernel density estimator: a survey. Stat. Papers53 (2012) 1–21. Zbl1241.62049MR2878587

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.