Draw a boxplot for the data: 1, 3, 5, 6, 7, 10, 13, 17, 20

A graph shows two peaks. What is this shape called, and what might it indicate?
Bimodal
Correlation does not equal ____________ except when you do an _____________.
causation; experiment with randomly assigned treatments
What is the difference between an observational study and an experiment?
Experiments assign treatments, observational studies don't.
Describe a completely randomized design.
Every subject gets their name in a hat and could be chosen for either testing group.
Make a stem-and-leaf plot for the data: 9, 14, 15, 5, 18, 19, 19, 20, 21, 23, 25, 33

What is standard deviation?
Average distance from the mean
What is the symbol for correlation? What is its range?
What are experimental units?
Subjects of an experiment
Describe a randomized block design.
Subjects are first grouped (i.e. blocked) and then treatments are randomly assigned within each block
Which measures of center and spread are resistant to outliers?
Median, IQR
What are three ways you might identify outliers?
Gaps, 1.5 IQR rule, 2 SD rule
Does an outlier make r increase or decrease? Explain.
It depends. If r is negative, it increases it. If r is positive, it decreases it.
What is blinding? Double-blinding? Placebo?
subjects don't know they're getting treatment, subjects and researchers don't know; a fake treatment
Describe a matched pairs design.
Subject is paired to their most like subject and treatments are assigned to one of the subjects in each pair.
When is it better to use median and IQR instead of mean and standard deviation?
When data is skewed or has outliers
A data set has a mean of 60 and standard deviation of 10. Would 85 be considered an outlier? Why?
Yes. It is more than 2 SDs above the mean.
What is the difference between an influential point and an outlier?
Outliers are far from the LSRL vertically, but influential points might be far from the rest of the data horizontally.
What is nonresponse bias? Give an example.
When a certain segment of the population chooses NOT to respond.
What is a cluster sample?
Population is naturally divided into clusters. Clusters are randomly chosen and everyone in the cluster is sampled.
An LSRL is used to predict a person's weight given their height. Interpret a residual of -10.2
This person weighs 10.2 pounds less than what was predicted for their height.
Are there any outliers by the 1.5IQR rule? 1, 3, 5, 6, 7, 10, 13, 17, 32
Yes. 32 is above upper fence of 31.5
Define r-squared.
The percent of variation in the y-variable that can be attributed to the linear model that uses the x-variable.
What is a confounding variable?
A variable that is not being assigned or recorded yet is having an affect on the response variable.
What is a systematic random sample?
Sample every nth person in the population.
How do we interpret the slope of an LSRL?
For each additional x-unit the y-variable increases/decreases by approximately the slope.
What a residual? What's the formula?
What type of hypothesis test is used for linear regression? How many degrees of freedom does it have?
t-test for a slope; df = n - 2
What is a conveneince sample?
A sample that was easy to get and was NOT random.
What is a stratified sample?
The population is grouped by natural strata. An SRS is taken within each strata.