5 Number Summary
z-scores
Scatterplots & Regression
Variables
Sampling & Experiments
100

How do you find the median if there is an even number of data points?

You find the average of the two middle numbers

100

What is the formula for finding a Z-score?

(X - μ) / σ

100

What are the two types of variables in a scatterplot? 

Explanatory and response

Response is dependent, explanatory is independent

100

An "individual" is...

An object described by a set of data

100

Name the 4 types of biases in sampling

- Response

-Non-response

-Wording effects

-Undercoverage

200

What is the median of this data set?

{12, 23, 34, 45, 54, 56, 67, 78, 89, 98}

55

(54 + 56) / 2 

= 110 / 2 

= 55

200

In a normal distribution, what is the probability that a randomly selected data point falls within one standard deviation of the mean?

68% 

(68, 95, 99.7% rule)

200

How do you find R2?

r2 

Correlation coefficient squared

200

What is the difference between categorical and quantitative variables?

A categorical variable places an individual into a group (categorizes)

A quantitative variable uses numerical values to charaterize an individual


200

Observational study vs. Experiment

Observational Study: Observes individuals, does not influence the response. 

Experiment: Deliberately imposes treatment(s) to see responses of individuals.

300

Given this 5 Number Summary, find the IQR

Min = 5

Q1 = 14

Median = 32

Q3 = 44

Max = 53

IQR = 30


300

What can you find with a z-score

(Hint: Which side is it?)

Use Table A to get the proportion of data to the left of x

300

What two numbers is the correlation always going to be between?

Bonus: What do both extremes mean? 

-1 and 1

-1 means it is a perfect negative correlation

1 means it is a perfect positive correlation

300

A researcher wants to use education level to explain differences in income. 

What is the explanatory variable? What is the response variable? 

Explanatory: Education level

Response: Income

300

Name the types of bad sampling designs. 

(that we talked about)

- Convenience (the easiest to reach)

- Voluntary response (opt-in surveys & studies)


400

The median is a better measure of center than the mean when the data is

Skewed/has many outliers

400

What are the attributes of a Normal distribution?

Single peaked, symmetric, area underneath =1

400

What is the equation for the regression line? And what are the equations for its components?

ŷ = mx + b

m = r (sy/sx)

b = ȳ − m.

400

When is the distinction between explanatory and response variables essential?

When creating a least-squares regression line

400

Matched Pairs vs. Block design

Block design is matched pairs, but in larger groups.

Matched Pairs: Pairing individuals that are similar to observe the effects of different treatments on similar subjects.

Block Design: Group of individuals that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.

500

What is the rule (in this class) for determining if a data point is an outlier?

1.5 * IQR rule. 

Q+ 1.5 * IQR < x, x is an outlier

Q1 - 1.5 *IQR > x, x is an outlier


500

What is the official name of the number you get out of Table A?

Cumulative Proportion
500

How do you find a residual, and what does it tell you?

Residual = Observed result - predicted result


Tells us how "off" our prediction was.

500

What does "confounded" mean?

When the effects of an explanatory and lurking variable on a response variable are indistinguishable from each other. 

500

Notation for the sample standard deviation and mean.

 s and 

M
e
n
u