BMR 617: Statistical Techniques for the Biomedical Sciences

Exploring data and relationships between variables: Review and Summary

This review includes statistical concepts and R commands from the following lectures:

Samples and Distributions
Confidence Intervals
Hypothesis Testing
Test for the value of a single proportion
χ² and Fisher's Exact Test
T-tests
- One-class T-test
- Two-class T-test
- Matched pairs T-test

Samples and Distributions

Understand that in an experiment or study we make observations on a sample but want to make inferences about the population from which the sample is drawn.

The Central Limit Theorem tells us about the distribution of sample means taken from a given distribution.

The standard error of the mean, \[\frac{2}{\sqrt{n}}\] is an estimate of the standard deviation of all samples means of size \(n\).

We also looked at how to draw bar charts and error bars in R. We saw how to make error bars representing the standard deviation, standard error of the mean, and 95% confidence intervals.

Confidence Intervals

A confidence interval is a range we construct from sample data when estimating a statistic, for example a mean or a proportion.

We choose a level of confidence, e.g. 95%.

We then construct an interval that we are 95% confident contains the "true" value of our statistic.

If we repeated this process (including the data gathering) over and over again, 95% of the intervals we constructed like this would include the true value.

Hypothesis Testing

We discussed the general framework for hypothesis testing:

Form the null and alternative hypotheses
Collect data
Compute an appropriate test statistic
Calculate a \(p\)-value
- The probability of observing data at least as extreme as the data we observed, assuming the null hypothesis is false
A low p-value means the data we got would be unlikely if the null hypothesis were true, so in this case we typically conclude our alternative hypothesis is true.

χ² and Fisher's exact tests

These tests are for the C -> C case. They test the null hypothesis that the proportion of each possible outcome is the same for each possible value of the explanatory variable.

T-tests

T-tests test the C -> Q case, where the explanatory variable has only two possible values.

Make sure you understand when to use paired ("matched pairs") t-tests, and when to use the two-class t-test.