Do Data Match An Expected Ratio?

Functions: binom.test, prop.test, chisq.test, fisher.test,

R code: .r .txt


Type of Data: Discrete, categorical (counts or frequencies)


One variable

The probabilty of success in a Bernoulli/binomial experiment

The binomial test is a test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories: e.g., does our count of heads:tails differ from a 1:1 ratio, when tossing a coin?

For example, if we toss a coin 10 times and get 8 heads, is it likely that this coin is biased?

We can specify x the number of failures (or successes) out of n number of trials.

binom.test(x = 8, n = 10)
## 
##  Exact binomial test
## 
## data:  8 and 10
## number of successes = 8, number of trials = 10, p-value = 0.1094
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.4439045 0.9747893
## sample estimates:
## probability of success 
##                    0.8

With such a small sample, it's hard to tell. What if we tossed the coin 10 more times?

binom.test(16, 20)
## 
##  Exact binomial test
## 
## data:  16 and 20
## number of successes = 16, number of trials = 20, p-value = 0.01182
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.563386 0.942666
## sample estimates:
## probability of success 
##                    0.8

For an alpha of 0.05, it suggests that this coin, or the way it is being flipped, needs some attention!

We can also use this test to ask other questions about the probabilty of two outcomes (including counts of things that do not at first seem to be counts of only two things). For example, we can ask if a six-sided die is biased to throw more 6's than we would expect.

If fair, the dice would roll a 6 one in six times (1/6). If we rolled 51 6's in a total of 235 rolls, what can we say?

To fit this into a binomial test, we do the following.

binom.test(x = 51, n = 235, p = 1/6, alternative = 'greater')
## 
##  Exact binomial test
## 
## data:  51 and 235
## number of successes = 51, number of trials = 235, p-value =
## 0.02654
## alternative hypothesis: true probability of success is greater than 0.1666667
## 95 percent confidence interval:
##  0.1735253 1.0000000
## sample estimates:
## probability of success 
##              0.2170213

The number of successes (x) is 51. The number of trials (n) is 235. The probability (null hypothesis) we expect (p) is 1/6. And the alternative hypothesis is that we roll 51 or more 6's (alternative = 'greater')---this is a one-tailed test.

We would use a two-tailed test if we wanted to know whether the die rolled too many or too few 6's (alternative = 'two.sided').

Comparing direction: the sign test

We can also use this same test to compare outcomes that we can see but cannot (or do not) measure, when it is called the sign test. The sign test is essentially a special case of the binomial test where the probability of success under the null hypothesis is p = 0.5.

It is often used to compare paired samples before and after some intervention, for example we could ask whether people felt better or worse after taking some cold medicine.

Say that 8 of 9 people felt better after taking our new medicine.

binom.test(x = 8, n = 9)
## 
##  Exact binomial test
## 
## data:  8 and 9
## number of successes = 8, number of trials = 9, p-value = 0.03906
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.5175035 0.9971909
## sample estimates:
## probability of success 
##              0.8888889

The p of 0.04 suggests that this medicine works.

Comparing counts or proportions: chisq.test()

We can use the chisq.test() function to perform a goodness-of-fit test (testing whether the observed data differs from a theoretical distribution).

If we supply chisq.test() with a single vector (i.e., a one-dimensional contingency table) of non-negative integers, the hypothesis tested is whether the population probabilities equal those in p, or are all equal if p is not given. (You must provide a probabilty for each element of x and p must sum to 1).

Is 50:50 different from a 1:1 ratio?

# These two give the same answer
chisq.test(x = c(50, 50))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(50, 50)
## X-squared = 0, df = 1, p-value = 1
# But here we explicitly state the probability
chisq.test(x = c(50, 50), p = c(0.5, 0.5) )
## 
##  Chi-squared test for given probabilities
## 
## data:  c(50, 50)
## X-squared = 0, df = 1, p-value = 1

No, 50:50 is not different to 1:1.

Is 50:50 different from a 3:1 ratio?

chisq.test(x = c(50,50), p = c(3/4, 1/4))
## 
##  Chi-squared test for given probabilities
## 
## data:  c(50, 50)
## X-squared = 33.333, df = 1, p-value = 7.764e-09

Yes, it is.


Two variables

Testing associations, proportion test: prop.test()

We can compare if the counts of things in different groups is the same between those groups.

For example, do males and females like spinach equally?

Here, the prop.test() function requires x a vector of the number of successes in each group, n a vector of the number of trials in each group. x could also be a matrix or table of 2 columns with successes and failures (in which case you don't need n).

For example, here we measure 4/40 males who like spinach and 196/3270 females who like spinach.

prop.test(c(4, 196), c(40, 3270))
## Warning in prop.test(c(4, 196), c(40, 3270)): Chi-squared approximation may
## be incorrect
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(4, 196) out of c(40, 3270)
## X-squared = 0.52289, df = 1, p-value = 0.4696
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.06591631  0.14603864
## sample estimates:
##     prop 1     prop 2 
## 0.10000000 0.05993884

So, no. There is no difference in the proportion of males and females who like spinach.

Rather than test whether all groups are equal, we can also specify p to say what probabilities we expect each group to have.

Testing associations, Chi square: chisq.test()

We have used chisq.test() to test the goodness-of-fit of a one-dimensional contingency table. If we have a table of at least two rows and two columns, the function interprets this as a two-dimensional contingency table: A Pearson's chi-squared test is performed of the null hypothesis that the joint distribution of the cell counts in a 2-dimensional contingency table is the product of the row and column marginals.

The data we have are our observed (O) data. The expected (E) data are calculated from the contingency table for each cell using the formula (Row Total x Column Total) / sum of all rows and columns.

The chi-square value is then the sum of [ (O - E)^2 / E ] for each cell.

For example, we might ask whether people who prefer gnocchi to pasta are the same as those who prefer grass to peas.

count <- matrix(c(38, 14, 11, 51), nrow = 2, 
                dimnames = list(c('gnocchi', 'pasta'), c('grass', 'peas')))
count
##         grass peas
## gnocchi    38   11
## pasta      14   51
chisq.test(count)
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  count
## X-squared = 33.112, df = 1, p-value = 8.7e-09

The results suggest an association between those who like both grass and gnocchi, and those who like both peas and pasta.

What's the difference between prop.test() and chisq.test()?

For a 2x2 table, they are essentially the same. See here.

Testing associations, Fisher's Exact Test: fisher.test()

Fisher's Exact Test is similarly used in the analysis of contingency tables. Fisher is said to have devised it after hearing that a colleague could tell whether the milk or tea was poured into the cup first.

To test this claim, she was given 8 cups of tea, in four of which milk was added first. The null hypothesis is that there is no association between the true order of pouring and the woman's guess, the alternative that there is a positive association (that the odds ratio is greater than 1).

TeaTasting <-
matrix(c(3, 1, 1, 3),
       nrow = 2,
       dimnames = list(Guess = c("Milk", "Tea"),
                       Truth = c("Milk", "Tea")))

TeaTasting
##       Truth
## Guess  Milk Tea
##   Milk    3   1
##   Tea     1   3
fisher.test(TeaTasting, alternative = "greater")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  TeaTasting
## p-value = 0.2429
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  0.3135693       Inf
## sample estimates:
## odds ratio 
##   6.408309

The p = 0.2429 tells us that the association could not be established.


Exercises

College student retention

In an effort to increase student retention, many colleges have tried block programs. Suppose 100 students are broken into two groups of 50 at random. One half are in a block program, the other half not. The number of years in attendance is then measured. We wish to test if the block program makes a difference in retention. The data is:

Program | 1yr | 2yr | 3yr | 4yr | 5+yrs |
------- | --- | --- | --- | --- | ----- |
Non-Block | 18 | 15 | 5 | 8 | 4 |
Block | 10 | 5 | 7 | 18 | 10 |

  1. Do a test of hypothesis to decide if there is a difference between the two types of programs in terms of retention.

Hint: Think about one row as observed and one as expected values.

Fish populations

A fish survey is done to see if the proportion of fish types is consistent with previous years. Suppose, the 3 types of fish recorded: parrotfish, grouper, tang are historically in a 5:3:4 proportion and in a survey the following counts are found:

Fish | pf | gr | ta |
----- | -- | -- | -- |
observed | 53 | 22 | 49 |

  1. Have the proportions of each species changed?

Quiz

  1. In the October of 2015, Jane saw 12 black-capped chickadees, 2 red-bellied woodpeckers, and a canada goose in my back yard. This year, she saw 10 back-capped chickadees, 5 red-bellied woodpeckers, and 1 zebra. Has the bird population changed significantly?

  2. John studies the reproductive phenology of trees. In 2014, 59 male trees flowered and 34 female trees. Is this population male-biased?

  3. In 2015, 71 male trees flowered and 39 female trees. Had the proportion of male-flowering trees changed?


Further Reading

Agresti, A. 2002. Categorical Data Analysis. Wiley. Amazon Wiley

Thompson, L. 2006. S-Plus (and R) *Manual to Accompany Agresti's Categorical Data Analysis *(2002) 2nd edition


Updated: 2016-10-12