# Statistical Approaches

Three distinct goals for which one might use statistics:

1. Hypothesis testing
• in general ANOVA -> “factor X influences factor Y”,
• falsifying null hypothesis (often obviously wrong).
1. Prediction
• precise enough to be wrong,
• probabalistic,
• prolific data,
• proper scales,
• place specific,
• make public.
1. Exploration
• forefront of knowledge,
• no p-values,
• hard to publish now,
• so, often dressed up as hypothesis-testing.

Which of these you are up to should be determined before you start (collecting and) analyzing the data.

These goals overlap with the summary statistics provided for statistical models.

It is possible to do an approach using the same statistical techniques; but, some techniques are not appropriate for some approaches.

1. p-value—used in a hypothetico-deductive framework telling us the probability the signal could have been observed by chance (e.g., does nitrogen increase crop yield?)
2. coefficient estimate—the biological significance (e.g., how much does crop yield increase given a level of nitrogen addition?)
3. R2—how much what we are studying explains vs the other sources of variation (e.g., how much of the variation in crop yield is due to nitrogen). How well/competely do we understand the system?

Much of science is focussed on p-values to the detriment of other information: A p < 0.05 with an effect size of 0.1% and R2 of 3% is not that useful!

Many ‘exploratory’ analysis are portrayed as hypothesis-testing. e.g.,

• do some data mining (also called data dredging) until you find a statistical relationship, then test it for statistical significance.
• if you do variable selection first and then significance testing on the selected variables your p-values will be much lower than they should be

## Example techniques

### 1. Hypothesis testing

• present full model
• model selection

### 2. Prediction

• present full model
• model selection
• regression over ANOVA

### 3. Exploration

• regression trees
• spline regression
• principle component analysis
• variable selection