Exercises: classical tests

Make a notched boxplot of the data used in the t test example.

In an effort to increase student retention, many colleges have tried block programs. Suppose 100 students are broken into two groups of 50 at random. One half are in a block program, the other half not. The number of years in attendance is then measured. We wish to test if the block program makes a difference in retention. The data is:

Program | 1yr | 2yr | 3yr | 4yr | 5+yrs |
------- | --- | --- | --- | --- | ----- |
Non-Block | 18 | 15 | 5 | 8 | 4 |
Block | 10 | 5 | 7 | 18 | 10 |

Do a test of hypothesis to decide if there is a difference between the two types of programs in terms of retention.

A fish survey is done to see if the proportion of fish types is consistent with previous years. Suppose, the 3 types of fish recorded: parrotfish, grouper, tang are historically in a 5:3:4 proportion and in a survey the following counts are found:

Fish | pf | gr | ta |
----- | -- | -- | -- |
observed | 53 | 22 | 49 |

Have the proportions of each species changed?

The R dataset UCBAdmissions contains data on admission to UC Berkeley by gender. We wish to investigate if the distribution of males admitted is similar to that of females.

To do so, we need to first do some spade work as the data set is presented in a complex contingency table. The ftable() (flatten table) command is needed. To use it try

data(UCBAdmissions)  # read in the dataset
x = ftable(UCBAdmissions)  # flatten
x  # what is there

##                 Dept   A   B   C   D   E   F
## Admit    Gender                             
## Admitted Male        512 353 120 138  53  22
##          Female       89  17 202 131  94  24
## Rejected Male        313 207 205 279 138 351
##          Female       19   8 391 244 299 317

We want to compare rows 1 and 2. Treating x as a matrix, we can access these with x[1:2,].

Do a test for homogeneity between the two rows. What do you conclude? Repeat for the rejected group.

An exit poll by a news station of 900 people in the state of Florida found 440 voting for Bush and 460 voting for Gore.

Does the data support the hypothesis that Bush received p = 50% of the state's vote?

Load the dataset blood (below).

blood <- structure(list(Machine = c(68, 82, 94, 106, 92, 80, 76, 74, 110, 93, 
    86, 65, 74, 84, 100), Expert = c(72, 84, 89, 100, 97, 88, 84, 70, 103, 84, 
    86, 63, 69, 87, 93)), .Names = c("Machine", "Expert"), row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15"), 
    class = "data.frame")

Do a significance test for equivalent centers. Which one did you use and why? What was the p-value?