library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.1     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Introduction

We started this series using coin tosses to introduce the concept of the central limit theorem and the binomial distribution. We’re now going to do the same Bernouli process using unfair coins. In other words, if we flip a coin 100 times that has a 40% chance of heads and 60% chance of tails, sum up the number of heads, and repeat a large number of times, what will the distribution look like? Will it be symmetric? Will it have a fat tail to one direction? After we have some information about this distribution, we will then ask the question of whether we can predict whether a coin is fair given the information about the tosses. This topic runs deep, but even by scratching the surface of it, we can discover and get some intuition around the vital concept of statistical testing and p-values.

Distributions of unfair coins

Below, we simulate coin flips with probability of heads ranging from 0.1 to 0.9. We examine the subsequent distributions in relation to the fair coin.

# A faster way to make the sequence
heads_probs <- seq(0.1, 0.9, 0.1)
heads_probs
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
n_sessions <- 100000

flip_list <- lapply(heads_probs, function(i) {
    num_heads <- lapply(seq(n_sessions), function(j) {
        result <-
            sample(
                x = c(0, 1),
                size = 100,
                replace = TRUE,
                prob = c(1 - i, i)
            )
        result <- sum(result) / length(result)
        return(result)
    }) %>% unlist()
})
names(flip_list) <- heads_probs

invisible(lapply(names(flip_list), function(i) {
    curr <- flip_list[[i]]
    hist(curr, breaks = 50, xlim = c(0, 1), main = paste("Probability heads:", i))
}))

flip_tib <- lapply(names(flip_list), function(i) {
    curr_flips <- flip_list[[i]]
    desc_stats <- summary(curr_flips) %>% as.list()
    result <- c(heads_prob = i, sd = sd(curr_flips), desc_stats)
    return(result)
}) %>% bind_rows()

flip_tib
## # A tibble: 9 x 8
##   heads_prob     sd  Min. `1st Qu.` Median   Mean `3rd Qu.`  Max.
##   <chr>       <dbl> <dbl>     <dbl>  <dbl>  <dbl>     <dbl> <dbl>
## 1 0.1        0.0298  0         0.08    0.1 0.0998      0.12  0.27
## 2 0.2        0.0401  0.05      0.17    0.2 0.200       0.23  0.39
## 3 0.3        0.0458  0.11      0.27    0.3 0.300       0.33  0.49
## 4 0.4        0.0490  0.21      0.37    0.4 0.400       0.43  0.63
## 5 0.5        0.0498  0.28      0.47    0.5 0.500       0.53  0.71
## 6 0.6        0.0492  0.39      0.57    0.6 0.600       0.63  0.81
## 7 0.7        0.0457  0.48      0.67    0.7 0.700       0.73  0.88
## 8 0.8        0.0399  0.61      0.77    0.8 0.800       0.83  0.95
## 9 0.9        0.0299  0.76      0.88    0.9 0.900       0.92  1

With unfair coins, we have a binomial-like distribution whose standard deviation is higher, the more fair the coin. It turns out that this distribution is in fact also the binomial distribution. We can see in the case of heads-probabilities of 0.1 and 0.9 that the distribution skews a little bit, but not as much as we saw in the extreme events distributions. Given that we can compute the binomial distributions of any fair or unfair coin, how can we determine whether a coin we’re given is fair or unfair, given some potentially small number of tosses?

Fair coin statistical testing

In this section, we’re going to figure out how to compute the probability that a coin is fair. Doing so will introduce us to the concept of statistical testing, the null hypothesis, and the p-value. Let’s start with a simple example. We’ll do 100 tosses of a fair coin and 10 tosses of a coin with a heads probability of 0.3.

set.seed(1)
fair_toss <- sample(x = c(0, 1), size = 100, replace = TRUE, prob = c(0.5, 0.5))
fair_toss
##   [1] 1 1 0 0 1 0 0 0 0 1 1 1 0 1 0 1 0 0 1 0 0 1 0 1 1 1 1 1 0 1 1 0 1 1 0 0 0
##  [38] 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1
##  [75] 1 0 0 1 0 0 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 1 0 0
unfair_toss <- sample(x = c(0, 1), size = 100, replace = TRUE, prob = c(0.7, 0.3))
unfair_toss
##   [1] 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
##  [38] 0 1 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0
##  [75] 0 1 0 1 0 1 0 0 1 0 1 0 1 1 1 0 1 0 0 1 0 0 0 1 0 1

Now, let’s pretend we know nothing about the fairness of the above coin tosses. Let’s hypothesize that both coins are fair. Now we have to test this hypothesis by coming up with the probability that this is true. We know from our binomial distribution that 100 tosses of a fair coin has a mean of one half heads, with a standard deviation of 5. Let’s first look at the mean and standard deviation of our tosses.

mean(fair_toss)
## [1] 0.52
sd(fair_toss)
## [1] 0.5021167
mean(unfair_toss)
## [1] 0.3
sd(unfair_toss)
## [1] 0.4605662

If we go back to the fair and unfair coin table that we made specifically for 100 tosses, we see already that the fair coin and the unfair coin have mean and standard deviations are comparable to those of the binomial distributions of respective “fairness” that we defined earlier. So just from checking the table we can make the hypothesis that the first coin is fair and the second coin is unfair. This might be enough, depending on what your goal is. But how can we assign actual probability to that?

Let’s start by making what is called the null hypothesis. It’s a hypothesis that our coin is fair, meaning the result we got (mean 0.3) exists on the binomial distribution of the fair coin with 100 tosses.

Now if we look at the unfair coin, we see that 0.3 is 6 standard deviations (0.05) from the mean (0.5). So we can look at exactly where that shows up on the binomial distribution. If we look at the histogram for the fair coin in the previous section, we see that the frequency of a mean of 0.3 occurs almost never. We’re on our way to rejecting the null hypothesis. Let’s quantify this by simple tabulation below.

num_heads <- lapply(seq(n_sessions), function(i) {
    result <- sample(x = c(0, 1), size = 100, replace = TRUE)
    result <- sum(result)/length(result)
    return(result)
}) %>% unlist()

mean_freq <- table(num_heads <= 0.3)
mean_freq/sum(mean_freq)
## 
##   FALSE    TRUE 
## 0.99999 0.00001

We can see that a mean of 0.3 happens with a frequency of 0.00002. So if we have the null hypothesis that the given coin is fair, but getting a mean of 0.3 happens with frequency 0.00002, then this frequency is how often this result shows up in our binomial distribution. We can double that value to 0.00004 if we want to consider both tails of the distribution. This is what a p-value actually is! We have rejected the null hypothesis with a p-value of 0.00004, or often written as p < 0.0001. We’ll note that there are plenty of other approaches to test whether a coin is fair. This one is the easiest to do with pure code. Later markdowns will explore how to solve this problem using Bayesian statistics, which is a completely different mindset.