Bayesian Statistics: How to tell if a coin is fair

Joseph Mack

jmack (at) wm7d (dot) net
AustinTek home page

v20120120, released under GPL-v3.

Abstract

The "hello world" example for Bayesian statistics.

Material/images from this webpage may be used, as long as credit is given to the author, and the url of this webpage is included as a reference.

Table of Contents

1. Intoductory Notes
2. Is the coin fair?
3. Toss the coin
4. Conditional Probability
5. Bayes Equation
6. Confirming Bayes Equation
7. The Monty Hall Paradox

1. Intoductory Notes

I've spent about 20yrs on and off looking to learn about Bayesian Statistics. I've fruitlessly scanned articles and text books to find few if any examples (but plenty of equations, whose relevance is not clear).

Recently, I obtained a book about to be consigned to the trash. "Intuitive Biostatistics", Harvey Motulsky, Oxford University Press, 1995, ISBN 0-19-5-8606-6, 0-19-508607-4(pbk.) This book has a section on Bayesian statistics, which proved to be the Rosetta stone for me on the subject. I expect Champolion and Ventris spent less time on their decoding projects than I have on Bayesian statistics. A big thanks to Harvey Motulsky. (The next project is to learn about maximum entropy.)

2. Is the coin fair?

If we see a coin tossed twice and we see 2 heads, we'd like to know if the coin is fair, or at least to be able to determine the probability that the coin is fair. It turns out that Bayesian statistics (and possibly any statistics) can't answer that question.

What we've asked is similar to the question "what is the chance of getting a 1 when we throw a die?". We need more information; we have to know the number of sides of the die and whether it's fair.

Well can we answer the question "if we know that the coin is either fair or has two heads, what is the probability, on seeing two consecutive throws of heads, that the coin is fair?"? We can't answer that either. However we can answer "if we know that the coin is either fair or has two heads, AND the probability of each case is 50%, then on seeing two consecutive throws turning up heads, what is the probability that the coin is fair?". It turns out that you need to know a lot about your system before you can determine the probability of the coin being fair.

3. Toss the coin

Motulsky has you run an experiment and see what happens. Here we're going to toss the coin. We only want the ratios i.e. the fraction of times that you expect to see 2 heads. This will be (on average) 1/4 of the time for a fair coin. If you do the experiment 1000 times, you won't get 2 heads exactly 250 times, so lets assume we do the experiment a very large number of times, and that the mean and the actual result are the same (at least percentage wise).

Half the time the coin is fair and half the time the coin is two headed. If we do 2000 throws what will we see? Half the time (1000 events) we'll be tossing a fair coin and we'll see 2 heads 1/4 of the time (250 events). The other half of the time (1000 events) we'll be tossing the two headed coin and we'll see 2 heads every time (1000 events). We're scoring by the number of times we see two heads (2H) and the number of times we don't see two heads (!2H) (i.e. when we see 0 or 1 head).

Table 1. Experiment: two consecutive tosses, repeated 2000 times, of a coin that has 50% probability of being fair or two headed.

		coin type
		fair	two headed	row total events	predictor
result type	2 heads (2H)	250	1000	1250	+ve (2H) = 20%
result type	0 or 1 head (!2H)	750	0	750	-ve (!2H) = 100%
	column total events	1000	1000	2000

If I get two heads, 20% (250/1250) of them occur with a fair coin i.e. I have a 20% chance of the coin being fair (two heads is a 20% predictor of a fair coin).

If I don't get two heads, 100% (750/750) of them are due to the fair coin (not getting two heads, i.e. getting 0 or 1 head, is a 100% predictor of a fair coin).

We've found what we want to know. However there's more information available.

If I get two heads, 80% (1000/1250) of them occur with a two headed coin.

If I don't get two heads, 0% (0/750) of them occur with a two headed coin.

4. Conditional Probability

If event A and event B are independant, then P(A AND B) = P(A).P(B) and life is very simple for the statistician.

Bayes equation is only relevant in situations when the events are dependant, i.e. in conditional probability (http://en.wikipedia.org/wiki/Conditional_probability), i.e. when the probability of an event A depends on whether event B happens (or doesn't happen). The probability of a plane crashing depends on whether the plane is flying or on the ground. The probability of getting wet depends on whether or not it's raining. The results of the coin tossing example above, the chance of getting two consecutive heads depends on whether whether the coin is fair or biased.

If A happening is dependant on B happening first, then P(A | B) is the (conditional) probability that you'll see A, if you're already seeing B. In this case

P(A and B) = P(A|B).P(B)

Let's say B is a person with something you want (e.g. an object, a piece of information) and A is them being willing to give/lend you the object. Then P(A | B) is the probability of being lent the item (after first finding someone with the item).

If 
P(B)   = 0.5             (the probability of a random person having the item you want) 
and
P(A|B) = 0.2             (the probability of getting the item, having found a person with the item)

then
P(A and B) = P(A|B).P(B) = 0.5 * 0.2 = 0.1 
                         (the probability of being lent the item by a random person, 
                          after having found the people having the item) 

note: P(A|!B) = 0 i.e. the probability of getting a loan of the item on finding a person, who doesn't have it, is 0.

This says that if you approach a random person, only P(A) = 1/10 of the time will they lend you the item. However only P(B) = 1/2 of the random people have the item. If you can first identify the people who have the item, then P(A | B) = 1/5 of them will lend you the item.

	Note
Because you're evaluating P(A \| B).P(B), you're getting to your destination (having the item in your hand) by finding someone with the item. Here there is only one route to your goal, through a person who has the item, so stating the route may seem redundant. As a practical matter, you don't necessarily care about the route to your goal. You may not explicitely first determine if each random person has the item. You may just ask them if they can lend you the item. If they decline, you may not care whether they have it. However the equation P(A \| B).P(B) has dictated your route to the goal and says that you got your item from someone that has one. The case of the coin toss is different. There are two routes to getting two heads, through the fair coin or the two headed coin. Since there you're trying to evaluate whether the two heads comes from the fair coin or the two headed coin, you do care which way you got your two heads.

Note

Because you're evaluating P(A | B).P(B), you're getting to your destination (having the item in your hand) by finding someone with the item. Here there is only one route to your goal, through a person who has the item, so stating the route may seem redundant. As a practical matter, you don't necessarily care about the route to your goal. You may not explicitely first determine if each random person has the item. You may just ask them if they can lend you the item. If they decline, you may not care whether they have it. However the equation P(A | B).P(B) has dictated your route to the goal and says that you got your item from someone that has one.

The case of the coin toss is different. There are two routes to getting two heads, through the fair coin or the two headed coin. Since there you're trying to evaluate whether the two heads comes from the fair coin or the two headed coin, you do care which way you got your two heads.

For another conditional probability example see the math test score example at conditional probability (http://www.mathgoodies.com/lessons/vol6/conditional.html); the smoking example at Stats: Conditional Probability (http://people.richland.edu/james/lecture/m170/ch05-cnd.html), where as a side effect you derive Bayes Theorem. .

5. Bayes Equation

Here is the Bayes equation, a formula whose obviousness is so clear that it doesn't need to be derived or explained (even in wikipedia e.g. Bayes Theorem http://en.wikipedia.org/wiki/Bayes'_theorem).

P(A|B)=P(B|A).P(A)/P(B)

where
P(A)   is called the prior 
P(A|B) is called the post
P(B|A)/P(B) is called the likelihood. 

post = prior*likelihood

The likelihood is the ratio (fraction of B that occurs when A is true)/(fraction of B for all experiments). In likelihood http://en.wikipedia.org/wiki/Likelihood_function, it's stated that likelihood is not a probability density function. The likelihood can be more than 1.

Bayes equation produces the same probabilities as are listed in the previous section below the table. Let's apply Bayes equation to the coin toss above. Here's the terms we'll use

F   coin is fair
!F  coin is not fair (it's two headed)
2H  result is two heads
!2H result is not two heads (it's 0 or 1 head)

What is the chance that the coin is fair (F) given that you see two heads (2H). ie what is P(F | 2H)?

P(F|2H)=P(2H|F).P(F)/P(2H)

P(2H|F)=chance of seeing 2H if the coin is F = 250/(250+750) = 1/4
P(F)   =1/2 (prior)
P(2H)  =1250/2000 = 5/8

post = P(F|2H) = (1/4*1/2)/(5/8) = 20%

likelihood = P(2H|F)/P(2H) = (1/4)/(5/8) = 2/5
post = prior * likelihood
 1/5 =  1/2  * 2/5

Conclusion: If you see 2 heads, there is a 20% chance that the coin is fair.

The likelihood is 2/5. For the fair coin, 1/4 of the tosses are 2H (P(F | 2H)=1/4). For all tosses, 5/8 of the throws are 2H (P(2H)=5/8). With likelihood<1, there are less heads (as a fraction) when you toss with just a fair coin, than when both fair and two headed coins are thrown.

What is the chance that the coin is fair (F) if you see 0 or 1 heads (!2H). ie what is P(F|!2H)?

P(F|!2H)=P(!2H|F).P(F)/P(!2H)

P(!2H|F)=750/1000=3/4
P(F)=1/2 prior
P(!2H)=750/1000*1/2=(3/4)*(1/2)

post = P(F|!2H) = (3/4*1/2)/(3/4*1/2) = 100%

likelihood = P(!2H|F)/P(!2H) = (3/4)/((3/4)*(1/2)) = 2
post = prior * likelihood
 1   =  1/2  * 2

The likelihood is 2. For the fair coin, 3/4 of the tosses are 0 or 1 heads (aren't two heads). For all tosses 3/8 of the tosses are 0 or 1 heads (aren't two heads). With the likelihood 2, this means that there are twice as many !2H (0 or 1 head) tosses (fraction wise) with a fair coin compared to both coins (there are no 0 or 1 head tosses with the two headed coin).

Conclusion: If you see a tail, you are certain that the coin is fair.

What is the chance that the coin is not fair (!F) given that you see 2 heads (2H)? ie what is P(!F|2H)?

P(!F|2H)=P(2H|!F).P(!F)/P(2H)

P(2H|!F)=1000/1000=1
P(!F)=1/2 prior
P(2H)=1250/2000 = 5/8 

hence post = P(!F|2H) = (1*1/2)/(5/8) = 80%

likelihood = P(2H|!F)/P(2H) = 1/(5/8) = 8/5
post = prior * likelihood
 4/5 = (1/2) * (8/5)

With the likelihood=8/5, you are will get 8/5 times (as a fraction) the number of 2 heads with the two headed coins as you will with both the fair and the two headed coin.

Conclusion: If you see 2 heads, there is an 80% chance that the coin is two headed.

What is the chance the the coin is not fair (!F) given that you see !2H (0 or 1 heads)? ie what is P(!F|!2H)?

P(!F|!2H)=P(!2H|!F).P(!F)/P(!2H)

P(!2H|!F)=0/0=0 (I guess, the event never happens)
P(!F)=1/2 prior
P(!2H)=(750/1000)*(1/2)=(3/4)*(1/2)

hence post = P(!F|!2H) = (0*1/2)/(3/4*1/2) = 0
likelihood = P(!2H|!F)/P(!2H) = 0
post = prior * likelihood
  0  =  1/2  * 0

With the likelihood=0 you will get 0 of the !2H tosses (0 or 1 head) with the two headed coin, compared to with both coins.

Conclusion: If you see at least one tail, there is 0 chance of the coin being two headed.

6. Confirming Bayes Equation

	Note
	I can't derive Bayes equation; this is just my way of showing that it works.

Rearranging Bayes equation you get

P(A|B).P(B) =  P(B|A).P(A)

For the coin toss example

P(F|2H).P(2H) =  P(2H|F).P(F) 

P(F|2H)       = 1/5	the probability of the coin being fair if see two heads
P(2H)         = 5/8     the prior probability of two heads
P(F|2H).P(2H) = 1/8     the probability that a fair coin was used to produce the two heads you see

P(2H|F)       = 1/4     the probability of a fair coin producing two heads
P(F)          = 1/2     the prior probability of a fair coin
P(2H|F).P(F)  = 1/8     the probability of seeing two heads if a fair coin was used 

thus
P(A|B).P(B) =  P(B|A).P(A)

Bayes equation describes the situation when the probability that a fair coin was used to produce two heads is equal to the probability of seeing two heads if a fair coin was used. This equality was implicitely built into the calculation of the probability table above, and Bayes equation is a result of this implicite assumption, rather than any new piece of knowledge from outside. It seems reasonable that these two numbers are identical; it seems reasonable to think that these are different words describing the same situation, but according to the math, they are different routes. It's not obvious that you've arrived at the same destination (other than the numerical value is the same). I can't see that this result has to be true in the general case. Neither can I think of a counter example.

However our knowledge on conditional probability (http://en.wikipedia.org/wiki/Conditional_probability), says

P(A|B).P(B) =  P(A and B) 
P(B|A).P(A) =  P(B and A)

but  
P(A and B) =  P(B and A)
hence
P(A|B).P(B) =  P(B|A).P(A)

So Bayes equation describes any conditional probability situation, since the situation where a fair coin was used to produce two heads is physically the same as the situation where you see two heads if a fair coin was used.

7. The Monty Hall Paradox

The Monty Hall Paradox has been analysed by Bayes method (http://en.wikipedia.org/wiki/Monty_Hall_problem#Bayes.27_theorem).

Footnotes: