# Sampling Distribution of the Sample Proportion as an Example of the Sampling Distribution of the Sample Mean

Proportion problems can be likened to a coin toss, where we think of a "heads" as 1, and a "tails" as 0. Proportions are related to percentages and probabilities, but proportions are (like probabilities) always given as numbers between 0 and 1. Examples include the proportion of

• Americans in favor of the death penalty,
• water samples that turn up E.Coli,
• paint samples that turn up lead,

etc.

If all samples of paint turn up lead, then the proportion of contaminated samples is 1; if no samples turn up lead, then the proportion is 0; and generally, the truth lies somewhere between!

You might wonder how one could ever know whether a coin is fair (that is, the chance of a head is the same as the chance of a tail). The truth of the matter is, you can't. All conditions might point to a fair coin -- it may be perfectly symmetric, etc. -- but you'll never know for sure. This points to the existence of a parameter, which we call $\left.\pi\right.$, and which indicates the true underlying proportion of heads (say). Note well: this $\left.\pi\right.$ is not the same as the $\left.\pi\right.$ that plays such an important role in the study of circles.

From a "frequentist's" perspective, the only way to understand $\left.\pi\right.$ is to toss the coin forever and see what happens to the ratio of heads to tosses. That's how we find the underlying parameters of the coin:

$\pi = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^nx_i}{n}$

Then the variance of that coin is

$\sigma^2 = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^n(x_i-\pi)^2}{n} = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^n(x_i^2-2x_i\pi + \pi^2)}{n}$

or

$\sigma^2 = \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^nx_i}{n} - 2\pi\lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^nx_i}{n} + \lim_{n \rightarrow \infty}\frac{\Sigma_{i=1}^n\pi^2}{n}$

(because $x_i^2=x_i$ for the coin toss); or

$\sigma^2= \pi - 2\pi^2 - \pi^2 = \pi - \pi^2 = \left(1-\pi\right)\pi$

Therefore

$\sigma = \sqrt{(1-\pi)\pi}$

Therefore, according to the theory of the sampling distribution of the sample mean, the parameters of the distribution of the sample mean are

$\mu_{\overline{x}}=\pi$,

and

$\sigma_{\overline{x}} = \sqrt{\frac{(1-\pi)\pi}{n}}$.

The proportion problem is interesting for (at least) two reasons:

1. once the mean is given, the standard deviation is known (somewhat unusual); and
2. Our rule for when the normal distribution assumption is valid casts a shadow on the old lie that $\left.n=30\right.$ is magic: for the proportion problem the rule we use is:
$0 < \pi-3\sigma_{\overline{x}} < \pi+3\sigma_{\overline{x}}<1$
That is: $\left.n=30\right.$ won't save you: it's a rule of thumb, but it depends entirely on the underlying distribution of x....