# Two-coin-toss sampling technique

## Providing cover for answering embarrassing questions -- without data loss

One way of providing anonymity to respondents in a survey is to provide them "deniability": their answer may or may not reflect their actual state or feelings. Herein we present one way of doing this: the two-coin-toss sampling technique.

We start with a fair coin (that is, a coin that comes up heads half the time and tails half the time, on average). But the fact that we should get half heads "on average" means that sometimes we won't get half heads. That randomness is going to cause there to be some noise (errors) in our calculations.

Then the question of interest is posed to the respondents: the question has two possible answers (one of which is "embarrassing" or "sensitive").

Respondents "consult the coin" to see what answer they should give to a question:

• The coin is flipped twice;
• if the coin lands heads twice, then the respondent tells the opposite of the truth; otherwise
• the respondent tells the truth.

This method provides cover to respondents so that they may answer embarrassing questions, but what about the lies: do they cause us to lose information, as we lost information in the one-coin technique?

No! Surprisingly we can recover the information, because the answer (though a lie) contains the truth (just in its opposite form, shall we say). Here's how it works.

Let's suppose that we have N people we're sampling. Let's say that T of them have the condition (and hence should answer "yes"), and F of them don't (so they will answer "no"). Then

N = T + F

If the true rate of our embarrassing condition is r, so that we would expect T=r*N of them to reply "yes", if everyone answered honestly. So, for example, if r=1/2, then half of the population would be expected to have the dread condition, and so half (T=1/2*N=N/2) would say that yes, they have the disease.

Now, that means that F=N-r*N=(1-r)*N of them would respond "no". These are the true ("T") and false ("F") replies we would expect. Because of the technique we don't expect to see these value, however.

If we actually count the yesses, we'll get a some of them from the false yesses. The equation for total yesses TY is

$TY = \frac{3}{4}*r*N + \frac{1}{4}*(1-r)*N$

We can solve this for our estimate of r. Let's simplify a little bit by distributing some common factors:

$TY = \frac{N}{4}(3*r + (1-r)) = \frac{N}{4}(1+2*r)$

Then

$\frac{4*TY}{N} = 1+2*r$

so

$r = \frac{1}{2}(\frac{4*TY}{N}-1) = \frac{4*TY-N}{2*N}$

So, ultimately, we estimate the rate r as

$r = \frac{4*TY-N}{2*N}$

where TY represents the total number of yes responses, and N is the number of respondents.

Let's see how the calculation proceeds via an example or two.

## Example One

Suppose we ask 100 people whether they are space aliens or not. Since we really don't believe any of them are space aliens, we'd expect about a 0% result. Suppose we use the two toss technique, and that 26 respondents report that they're space aliens. We want to estimate the true rate of space-alien-ness in the population.

We estimate the rate r as

$r = \frac{4*26-100}{2*100}=\frac{4}{200}=\frac{2}{100}$

or 2% space aliens in the population.

Now that might seem a little high: but that's how the coins flipped; that's how the cookie crumbled. The randomness has caused us to get the wrong answer; but the good news is that we're in the ballpark.

## Example Two

It's possible for the estimated rate to be negative. What do you suppose that you should do in that case? (Zero!)

Let's reconsider the space alien question from above. Suppose that 24 respondents report that they're space aliens. We want to estimate the true rate of space-alien-ness in the population.

We estimate the rate r as

$r = \frac{4*24-100}{2*100}=\frac{-4}{200}=\frac{-2}{100}$

or -2% space aliens in the population.

We find it very hard to believe that there is -2% alienness in our environment. Since negative answers don't make any sense, we interpret them as zeros.

## Example Three

One more example: let's suppose that we're looking for the true rate of AIDS in a population, where we expect 5% of the people to have the disease. We sample 500 people using the one-coin-toss technique, and 131 people answer yes, that they have AIDS. We estimate the true rate of AIDS to be

$r = \frac{4*136-500}{2*500}=\frac{44}{1000}=\frac{4.4}{100}$

We estimate that 4.4% of the population has AIDS. Now that's a little lower than what we expected. It might be because we over-estimated the rate in the original population; or because of the randomness inherent in the design of the experimental technique. Every now and then you'll get "unlucky" in the tosses of the coin, and you'll get fewer head-head pairs than you'd expect by chance, for example: this will lead to errors in our calculations.

One way to eliminate this effect would be to assign a quarter of the people as HH people (you could put paper slips in a hat, and let each person pretend that that's their toss, for example -- then they could throw the paper away). There's always a better way!