A comparison of automatic histogram constructions

Laurie Davies; Ursula Gather; Dan Nordman; Henrike Weinert

ESAIM: Probability and Statistics (2009)

  • Volume: 13, page 181-196
  • ISSN: 1292-8100

Abstract

top
Even for a well-trained statistician the construction of a histogram for a given real-valued data set is a difficult problem. It is even more difficult to construct a fully automatic procedure which specifies the number and widths of the bins in a satisfactory manner for a wide range of data sets. In this paper we compare several histogram construction procedures by means of a simulation study. The study includes plug-in methods, cross-validation, penalized maximum likelihood and the taut string procedure. Their performance on different test beds is measured by their ability to identify the peaks of an underlying density as well as by Hellinger distance.

How to cite

top

Davies, Laurie, et al. "A comparison of automatic histogram constructions." ESAIM: Probability and Statistics 13 (2009): 181-196. <http://eudml.org/doc/250660>.

@article{Davies2009,
abstract = { Even for a well-trained statistician the construction of a histogram for a given real-valued data set is a difficult problem. It is even more difficult to construct a fully automatic procedure which specifies the number and widths of the bins in a satisfactory manner for a wide range of data sets. In this paper we compare several histogram construction procedures by means of a simulation study. The study includes plug-in methods, cross-validation, penalized maximum likelihood and the taut string procedure. Their performance on different test beds is measured by their ability to identify the peaks of an underlying density as well as by Hellinger distance. },
author = {Davies, Laurie, Gather, Ursula, Nordman, Dan, Weinert, Henrike},
journal = {ESAIM: Probability and Statistics},
keywords = {Regular histogram; model selection; penalized likelihood; taut string; regular histogram; penalized likelihood},
language = {eng},
month = {6},
pages = {181-196},
publisher = {EDP Sciences},
title = {A comparison of automatic histogram constructions},
url = {http://eudml.org/doc/250660},
volume = {13},
year = {2009},
}

TY - JOUR
AU - Davies, Laurie
AU - Gather, Ursula
AU - Nordman, Dan
AU - Weinert, Henrike
TI - A comparison of automatic histogram constructions
JO - ESAIM: Probability and Statistics
DA - 2009/6//
PB - EDP Sciences
VL - 13
SP - 181
EP - 196
AB - Even for a well-trained statistician the construction of a histogram for a given real-valued data set is a difficult problem. It is even more difficult to construct a fully automatic procedure which specifies the number and widths of the bins in a satisfactory manner for a wide range of data sets. In this paper we compare several histogram construction procedures by means of a simulation study. The study includes plug-in methods, cross-validation, penalized maximum likelihood and the taut string procedure. Their performance on different test beds is measured by their ability to identify the peaks of an underlying density as well as by Hellinger distance.
LA - eng
KW - Regular histogram; model selection; penalized likelihood; taut string; regular histogram; penalized likelihood
UR - http://eudml.org/doc/250660
ER -

References

top
  1. H. Akaike, A new look at the statistical model identification. IEEE Trans. Automatic Control19 (1973) 716–723.  Zbl0314.62039
  2. A. Azzalini and A.W. Bowman, A look at some data on the Old Faithful geyser. Appl. Statist.39 (1990) 357–365.  Zbl0707.62186
  3. A. Barron, L. Birgé and P. Massart, Risk bounds for model selection via penalization. Probab. Theory Relat. Fields113 (1999) 301–413.  Zbl0946.62036
  4. L. Birgé and Y. Rozenholc, How many bins should be put in a regular histogram? ESAIM: PS10 (2006) 24–45.  Zbl1136.62329
  5. J.E. Daly, Construction of optimal histograms. Commun. Stat., Theory Methods17 (1988) 2921–2931.  Zbl0696.62175
  6. P.L. Davies and A. Kovac, Local extremes, runs, strings and multiresolution (with discussion). Ann. Stat.29 (2001) 1–65.  Zbl1029.62038
  7. P.L. Davies and A. Kovac, Densities, spectral densities and modality. Ann. Stat.32 (2004) 1093–1136.  Zbl1093.62042
  8. P.L. Davies and A. Kovac, ftnonpar, R-package, version 0.1-82, (2008).  URIhttp://www.r-project.org
  9. L. Devroye and L. Györfi, Nonparametric density estimation: the L1 view. John Wiley, New York (1985).  Zbl0546.62015
  10. L. Dümbgen and G. Walther, Multiscale inference about a density. Ann. Stat.36 (2008) 1758–1785.  Zbl1142.62336
  11. J. Engel, The multiresolution histogram. Metrika46 (1997) 41–57.  Zbl0872.62041
  12. D. Freedman and P. Diaconis, On the histogram as a density estimator: L2 theory. Z. Wahr. Verw. Geb.57 (1981) 453–476.  Zbl0449.62033
  13. I.J. Good and R.A. Gaskins, Density estimation and bump-hunting by the penalizes likelihood method exemplified by scattering and meteorite data. J. Amer. Statist. Assoc.75 (1980) 42–73.  Zbl0432.62024
  14. P. Hall, Akaike's information criterion and Kullback-Leibler loss for histogram density estimation. Probab. Theory Relat. Fields85 (1990) 449–467.  Zbl0675.62027
  15. P. Hall and E.J. Hannan, On stochastic complexity and nonparametric density estimation. Biometrika75 (1988) 705–714.  Zbl0661.62025
  16. P. Hall and M.P. Wand, Minimizing L1 distance in nonparametric density estimation. J. Multivariate Anal.26 (1988) 59–88.  Zbl0673.62030
  17. K. He and G. Meeden, Selecting the number of bins in a histogram: A decision theoretic approach. J. Stat. Plann. Inference61 (1997) 49–59.  Zbl0879.62002
  18. Y. Kanazawa, An optimal variable cell histogram. Commun. Stat., Theory Methods17 (1988) 1401–1422.  Zbl0641.62029
  19. Y. Kanazawa, An optimal variable cell histogram based on the sample spacings. Ann. Stat.20 (1992) 291–304.  Zbl0745.62034
  20. Y. Kanazawa, Hellinger distance and Akaike's information criterion for the histogram. Statist. Probab. Lett.17 (1993) 293–298.  Zbl0779.62041
  21. C.R. Loader, Bandwidth selection: classical or plug-in? Ann. Stat.27 (1999) 415–438.  Zbl0938.62035
  22. J.S. Marron and M.P. Wand, Exact mean integrated squared error. Ann. Stat.20 (1992) 712–736.  Zbl0746.62040
  23. M. Postman, J.P. Huchra and M.J. Geller, Probes of large-scale structures in the Corona Borealis region. Astrophys. J.92, (1986) 1238–1247.  
  24. J. Rissanen, A universal prior for integers and estimation by minimum description length. Ann. Stat.11 (1983) 416–431.  Zbl0513.62005
  25. J. Rissanen, Stochastic Complexity (with discussion). J. R. Statist. Soc. B49 (1987) 223–239.  Zbl0654.62008
  26. J. Rissanen, Stochastic complexity in statistical inquiry. World Scientific, New Jersey (1989).  Zbl0800.68508
  27. J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inf. Theory42 (1996) 40–47.  Zbl0856.94006
  28. J. Rissanen, T.P. Speed and B. Yu, Density estimation by stochastic complexity. IEEE Trans. Inf. Theory38 (1992) 315–323.  Zbl0743.62004
  29. K. Roeder, Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Amer. Statist. Assoc.85 (1990) 617–624.  Zbl0704.62103
  30. M. Rudemo, Empirical choice of histograms and kernel density estimators. Scand. J. Statist.9 (1982)65–78.  Zbl0501.62028
  31. G. Schwartz, Estimating the dimension of a model. Ann. Stat.6 (1978) 461–464.  Zbl0379.62005
  32. D.W. Scott, On optimal and data-based histograms. Biometrika66 (1979) 605–610.  Zbl0417.62031
  33. D.W. Scott, Multivariate density estimation: theory, practice, and visualization. Wiley, New York (1992).  Zbl0850.62006
  34. B.W. Silverman, Choosing the window width when estimating a density. Biometrika65 (1978) 1–11.  Zbl0371.62053
  35. B.W. Silverman, Density estimation for statistics and data analysis. Chapman and Hall, London (1985).  Zbl0617.62042
  36. J.S. Simonoff and F. Udina, Measuring the stability of histogram appearance when the anchor position is changed. Comput. Stat. Data Anal.23 (1997) 335–353.  Zbl0875.62158
  37. H. Sturges, The choice of a class-interval. J. Amer. Statist. Assoc.21 (1926) 65–66.  
  38. W. Szpankowski, On asymptotics of certain recurrences arising in universal coding. Prob. Inf. Trans.34 (1998) 142–146.  Zbl0990.94018
  39. M.P. Wand, Data-based choice of histogram bin width. American Statistician51 (1997) 59–64.  
  40. M.P. Wand and B. Ripley, KernSmooth, R-package, version 2.22-21, (2007).  URIhttp://www.r-project.org

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.