Sampling Distribution of the Sample Proportion as an Example of the Sampling Distribution of the Sample Mean

From www.norsemathology.org

Jump to: navigation, search

Proportion problems can be likened to a coin toss, where we think of a "heads" as 1, and a "tails" as 0. Proportions are related to percentages and probabilities, but proportions are (like probabilities) always given as numbers between 0 and 1. Examples include the proportion of

  • Americans in favor of the death penalty,
  • water samples that turn up E.Coli,
  • paint samples that turn up lead,

etc.

If all samples of paint turn up lead, then the proportion of contaminated samples is 1; if no samples turn up lead, then the proportion is 0; and generally, the truth lies somewhere between!

You might wonder how one could ever know whether a coin is fair (that is, the chance of a head is the same as the chance of a tail). The truth of the matter is, you can't. All conditions might point to a fair coin -- it may be perfectly symmetric, etc. -- but you'll never know for sure. This points to the existence of a parameter, which we call \left.\pi\right., and which indicates the true underlying proportion of heads (say). Note well: this \left.\pi\right. is not the same as the \left.\pi\right. that plays such an important role in the study of circles.

From a "frequentist's" perspective, the only way to understand \left.\pi\right. is to toss the coin forever and see what happens to the ratio of heads to tosses. That's how we find the underlying parameters of the coin:

\pi = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^nx_i}{n}

Then the variance of that coin is

\sigma^2 = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^n(x_i-\pi)^2}{n} = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^n(x_i^2-2x_i\pi + \pi^2)}{n}

or

\sigma^2 = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^nx_i}{n}
- 2\pi\lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^nx_i}{n}
+ \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^n\pi^2}{n}

(because x_i^2=x_i for the coin toss); or

\sigma^2= \pi - 2\pi^2 - \pi^2 = \pi - \pi^2 = \left(1-\pi\right)\pi

Therefore

\sigma = \sqrt{(1-\pi)\pi}

Therefore, according to the theory of the sampling distribution of the sample mean, the parameters of the distribution of the sample mean are

\mu_{\overline{x}}=\pi,

and

\sigma_{\overline{x}} = \sqrt{\frac{(1-\pi)\pi}{n}}.

The proportion problem is interesting for (at least) two reasons:

  1. once the mean is given, the standard deviation is known (somewhat unusual); and
  2. Our rule for when the normal distribution assumption is valid casts a shadow on the old lie that \left.n=30\right. is magic: for the proportion problem the rule we use is:
    0 < \pi-3\sigma_{\overline{x}} < \pi+3\sigma_{\overline{x}}<1
    That is: \left.n=30\right. won't save you: it's a rule of thumb, but it depends entirely on the underlying distribution of x....
Personal tools