# A comparison of automatic histogram constructions

Laurie Davies; Ursula Gather; Dan Nordman; Henrike Weinert

ESAIM: Probability and Statistics (2009)

- Volume: 13, page 181-196
- ISSN: 1292-8100

## Access Full Article

top## Abstract

top## How to cite

topDavies, Laurie, et al. "A comparison of automatic histogram constructions." ESAIM: Probability and Statistics 13 (2009): 181-196. <http://eudml.org/doc/250660>.

@article{Davies2009,

abstract = {
Even for a well-trained statistician the construction of a histogram
for a given real-valued data set is a difficult problem. It is even
more difficult to construct a fully automatic procedure which
specifies the number and widths of the bins in a satisfactory manner
for a wide range of data sets. In this paper we compare several
histogram construction procedures by means of a simulation
study. The study includes plug-in methods, cross-validation,
penalized maximum
likelihood and the taut string procedure. Their performance on
different test beds is measured by
their ability to identify the peaks of an underlying density as
well as by Hellinger distance.
},

author = {Davies, Laurie, Gather, Ursula, Nordman, Dan, Weinert, Henrike},

journal = {ESAIM: Probability and Statistics},

keywords = {Regular histogram; model selection; penalized
likelihood; taut string; regular histogram; penalized likelihood},

language = {eng},

month = {6},

pages = {181-196},

publisher = {EDP Sciences},

title = {A comparison of automatic histogram constructions},

url = {http://eudml.org/doc/250660},

volume = {13},

year = {2009},

}

TY - JOUR

AU - Davies, Laurie

AU - Gather, Ursula

AU - Nordman, Dan

AU - Weinert, Henrike

TI - A comparison of automatic histogram constructions

JO - ESAIM: Probability and Statistics

DA - 2009/6//

PB - EDP Sciences

VL - 13

SP - 181

EP - 196

AB -
Even for a well-trained statistician the construction of a histogram
for a given real-valued data set is a difficult problem. It is even
more difficult to construct a fully automatic procedure which
specifies the number and widths of the bins in a satisfactory manner
for a wide range of data sets. In this paper we compare several
histogram construction procedures by means of a simulation
study. The study includes plug-in methods, cross-validation,
penalized maximum
likelihood and the taut string procedure. Their performance on
different test beds is measured by
their ability to identify the peaks of an underlying density as
well as by Hellinger distance.

LA - eng

KW - Regular histogram; model selection; penalized
likelihood; taut string; regular histogram; penalized likelihood

UR - http://eudml.org/doc/250660

ER -

## References

top- H. Akaike, A new look at the statistical model identification. IEEE Trans. Automatic Control19 (1973) 716–723.
- A. Azzalini and A.W. Bowman, A look at some data on the Old Faithful geyser. Appl. Statist.39 (1990) 357–365.
- A. Barron, L. Birgé and P. Massart, Risk bounds for model selection via penalization. Probab. Theory Relat. Fields113 (1999) 301–413.
- L. Birgé and Y. Rozenholc, How many bins should be put in a regular histogram? ESAIM: PS10 (2006) 24–45.
- J.E. Daly, Construction of optimal histograms. Commun. Stat., Theory Methods17 (1988) 2921–2931.
- P.L. Davies and A. Kovac, Local extremes, runs, strings and multiresolution (with discussion). Ann. Stat.29 (2001) 1–65.
- P.L. Davies and A. Kovac, Densities, spectral densities and modality. Ann. Stat.32 (2004) 1093–1136.
- P.L. Davies and A. Kovac, ftnonpar, R-package, version 0.1-82, (2008). URIhttp://www.r-project.org
- L. Devroye and L. Györfi, Nonparametric density estimation: the L1 view. John Wiley, New York (1985).
- L. Dümbgen and G. Walther, Multiscale inference about a density. Ann. Stat.36 (2008) 1758–1785.
- J. Engel, The multiresolution histogram. Metrika46 (1997) 41–57.
- D. Freedman and P. Diaconis, On the histogram as a density estimator: L2 theory. Z. Wahr. Verw. Geb.57 (1981) 453–476.
- I.J. Good and R.A. Gaskins, Density estimation and bump-hunting by the penalizes likelihood method exemplified by scattering and meteorite data. J. Amer. Statist. Assoc.75 (1980) 42–73.
- P. Hall, Akaike's information criterion and Kullback-Leibler loss for histogram density estimation. Probab. Theory Relat. Fields85 (1990) 449–467.
- P. Hall and E.J. Hannan, On stochastic complexity and nonparametric density estimation. Biometrika75 (1988) 705–714.
- P. Hall and M.P. Wand, Minimizing L1 distance in nonparametric density estimation. J. Multivariate Anal.26 (1988) 59–88.
- K. He and G. Meeden, Selecting the number of bins in a histogram: A decision theoretic approach. J. Stat. Plann. Inference61 (1997) 49–59.
- Y. Kanazawa, An optimal variable cell histogram. Commun. Stat., Theory Methods17 (1988) 1401–1422.
- Y. Kanazawa, An optimal variable cell histogram based on the sample spacings. Ann. Stat.20 (1992) 291–304.
- Y. Kanazawa, Hellinger distance and Akaike's information criterion for the histogram. Statist. Probab. Lett.17 (1993) 293–298.
- C.R. Loader, Bandwidth selection: classical or plug-in? Ann. Stat.27 (1999) 415–438.
- J.S. Marron and M.P. Wand, Exact mean integrated squared error. Ann. Stat.20 (1992) 712–736.
- M. Postman, J.P. Huchra and M.J. Geller, Probes of large-scale structures in the Corona Borealis region. Astrophys. J.92, (1986) 1238–1247.
- J. Rissanen, A universal prior for integers and estimation by minimum description length. Ann. Stat.11 (1983) 416–431.
- J. Rissanen, Stochastic Complexity (with discussion). J. R. Statist. Soc. B49 (1987) 223–239.
- J. Rissanen, Stochastic complexity in statistical inquiry. World Scientific, New Jersey (1989).
- J. Rissanen, Fisher information and stochastic complexity. IEEE Trans. Inf. Theory42 (1996) 40–47.
- J. Rissanen, T.P. Speed and B. Yu, Density estimation by stochastic complexity. IEEE Trans. Inf. Theory38 (1992) 315–323.
- K. Roeder, Density estimation with confidence sets exemplified by superclusters and voids in galaxies. J. Amer. Statist. Assoc.85 (1990) 617–624.
- M. Rudemo, Empirical choice of histograms and kernel density estimators. Scand. J. Statist.9 (1982)65–78.
- G. Schwartz, Estimating the dimension of a model. Ann. Stat.6 (1978) 461–464.
- D.W. Scott, On optimal and data-based histograms. Biometrika66 (1979) 605–610.
- D.W. Scott, Multivariate density estimation: theory, practice, and visualization. Wiley, New York (1992).
- B.W. Silverman, Choosing the window width when estimating a density. Biometrika65 (1978) 1–11.
- B.W. Silverman, Density estimation for statistics and data analysis. Chapman and Hall, London (1985).
- J.S. Simonoff and F. Udina, Measuring the stability of histogram appearance when the anchor position is changed. Comput. Stat. Data Anal.23 (1997) 335–353.
- H. Sturges, The choice of a class-interval. J. Amer. Statist. Assoc.21 (1926) 65–66.
- W. Szpankowski, On asymptotics of certain recurrences arising in universal coding. Prob. Inf. Trans.34 (1998) 142–146.
- M.P. Wand, Data-based choice of histogram bin width. American Statistician51 (1997) 59–64.
- M.P. Wand and B. Ripley, KernSmooth, R-package, version 2.22-21, (2007). URIhttp://www.r-project.org

## NotesEmbed ?

topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.