Estimation of proportion

Ryszard Zieliński

Mathematica Applicanda (2008)

  • Volume: 36, Issue: 50/09
  • ISSN: 1730-2668

Abstract

top
A population of N elements contains an unknown number M of marked units. Problems of estimating the fraction θ = M/N are discussed. The well known standard solution isˆθ = K/n which is the uniformly minimum variance unbiased estimator, maximum likelihood estimator, estimator obtained by the method of moments, and in consequence it shares all advantages of such estimators. In the paper some versions of the estimator are considered which are more adequate in real situations. If we know in advance that the unknown fraction lies in a given interval (t1, t2) and we consider an estimator ˆθ1 as better than the estimator ˆθ2 if the average of its mean square error is smaller on that interval, then the optimal estimator is given by (3). The values of the estimator for (t1, t2) = (0, 0.5) and for (t1, t2) = (0.3, 0.4) in a sample of size n = 10 if the number of marked units in the sample equals K, are given in the table TABELKA and the mean square errors of these estimator, versus the error of the standard estimator ˆθ = K/n are presented in Rys. 2. Averaging the mean square error with a weight function, for example such as in Rys.3, gives us the Bayesian estimator with the mean square error like in Rys. 4 (for n = 10). If in some real situations we are interested in minimizing the mean square error “in the worst possible case”, the adequate is the minimax estimator. Another situation appears if the population can be divided in some more homogenous subpopulations, for example in two subpopulations with fractions of marked units close to zero or close to one in each of them. Then stratified sampling is more effective; then the mean square error of estimation may be significantly reduced. In the paper the problem of randomizedresponses is also presented, very shortly and elementarily. The problem arises if a unit in the sample can not be for sure recognized as “marked” or “not marked” and that can be done with some probability only. The situation is typical for survey interview: it allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while remaining confidential. The final section of the paper is devoted to some remarks concerning the confidence intervals for the fraction. The exact optimal solution is well known for mathematicians but it is probably not very easy for statistical practitioners to follow all theoretical details, and typically confidence interval based on asymptotic approximation of the binomial distribution by a normal distribution are used. That is neither sufficiently exact nor correct. The proper and exact solution is given by quantiles of a suitable Beta distribution which are easily computable in typical statistical and mathematical computer packages.

How to cite

top

Ryszard Zieliński. "Estimation of proportion." Mathematica Applicanda 36.50/09 (2008): null. <http://eudml.org/doc/293060>.

@article{RyszardZieliński2008,
abstract = {A population of N elements contains an unknown number M of marked units. Problems of estimating the fraction θ = M/N are discussed. The well known standard solution isˆθ = K/n which is the uniformly minimum variance unbiased estimator, maximum likelihood estimator, estimator obtained by the method of moments, and in consequence it shares all advantages of such estimators. In the paper some versions of the estimator are considered which are more adequate in real situations. If we know in advance that the unknown fraction lies in a given interval (t1, t2) and we consider an estimator ˆθ1 as better than the estimator ˆθ2 if the average of its mean square error is smaller on that interval, then the optimal estimator is given by (3). The values of the estimator for (t1, t2) = (0, 0.5) and for (t1, t2) = (0.3, 0.4) in a sample of size n = 10 if the number of marked units in the sample equals K, are given in the table TABELKA and the mean square errors of these estimator, versus the error of the standard estimator ˆθ = K/n are presented in Rys. 2. Averaging the mean square error with a weight function, for example such as in Rys.3, gives us the Bayesian estimator with the mean square error like in Rys. 4 (for n = 10). If in some real situations we are interested in minimizing the mean square error “in the worst possible case”, the adequate is the minimax estimator. Another situation appears if the population can be divided in some more homogenous subpopulations, for example in two subpopulations with fractions of marked units close to zero or close to one in each of them. Then stratified sampling is more effective; then the mean square error of estimation may be significantly reduced. In the paper the problem of randomizedresponses is also presented, very shortly and elementarily. The problem arises if a unit in the sample can not be for sure recognized as “marked” or “not marked” and that can be done with some probability only. The situation is typical for survey interview: it allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while remaining confidential. The final section of the paper is devoted to some remarks concerning the confidence intervals for the fraction. The exact optimal solution is well known for mathematicians but it is probably not very easy for statistical practitioners to follow all theoretical details, and typically confidence interval based on asymptotic approximation of the binomial distribution by a normal distribution are used. That is neither sufficiently exact nor correct. The proper and exact solution is given by quantiles of a suitable Beta distribution which are easily computable in typical statistical and mathematical computer packages.},
author = {Ryszard Zieliński},
journal = {Mathematica Applicanda},
keywords = {Frakcja, prawdopodobieństwo sukcesu w doświadczeniu Bernoulliego, estymator nieobciążony, estymator o jednostajnie minimalnym błędzie średniokwadratowym, estymator Bayesowski, losowanie warstwowe, randomizowane odpowiedzi, przedział ufności.},
language = {eng},
number = {50/09},
pages = {null},
title = {Estimation of proportion},
url = {http://eudml.org/doc/293060},
volume = {36},
year = {2008},
}

TY - JOUR
AU - Ryszard Zieliński
TI - Estimation of proportion
JO - Mathematica Applicanda
PY - 2008
VL - 36
IS - 50/09
SP - null
AB - A population of N elements contains an unknown number M of marked units. Problems of estimating the fraction θ = M/N are discussed. The well known standard solution isˆθ = K/n which is the uniformly minimum variance unbiased estimator, maximum likelihood estimator, estimator obtained by the method of moments, and in consequence it shares all advantages of such estimators. In the paper some versions of the estimator are considered which are more adequate in real situations. If we know in advance that the unknown fraction lies in a given interval (t1, t2) and we consider an estimator ˆθ1 as better than the estimator ˆθ2 if the average of its mean square error is smaller on that interval, then the optimal estimator is given by (3). The values of the estimator for (t1, t2) = (0, 0.5) and for (t1, t2) = (0.3, 0.4) in a sample of size n = 10 if the number of marked units in the sample equals K, are given in the table TABELKA and the mean square errors of these estimator, versus the error of the standard estimator ˆθ = K/n are presented in Rys. 2. Averaging the mean square error with a weight function, for example such as in Rys.3, gives us the Bayesian estimator with the mean square error like in Rys. 4 (for n = 10). If in some real situations we are interested in minimizing the mean square error “in the worst possible case”, the adequate is the minimax estimator. Another situation appears if the population can be divided in some more homogenous subpopulations, for example in two subpopulations with fractions of marked units close to zero or close to one in each of them. Then stratified sampling is more effective; then the mean square error of estimation may be significantly reduced. In the paper the problem of randomizedresponses is also presented, very shortly and elementarily. The problem arises if a unit in the sample can not be for sure recognized as “marked” or “not marked” and that can be done with some probability only. The situation is typical for survey interview: it allows respondents to respond to sensitive issues (such as criminal behavior or sexuality) while remaining confidential. The final section of the paper is devoted to some remarks concerning the confidence intervals for the fraction. The exact optimal solution is well known for mathematicians but it is probably not very easy for statistical practitioners to follow all theoretical details, and typically confidence interval based on asymptotic approximation of the binomial distribution by a normal distribution are used. That is neither sufficiently exact nor correct. The proper and exact solution is given by quantiles of a suitable Beta distribution which are easily computable in typical statistical and mathematical computer packages.
LA - eng
KW - Frakcja, prawdopodobieństwo sukcesu w doświadczeniu Bernoulliego, estymator nieobciążony, estymator o jednostajnie minimalnym błędzie średniokwadratowym, estymator Bayesowski, losowanie warstwowe, randomizowane odpowiedzi, przedział ufności.
UR - http://eudml.org/doc/293060
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.