Correlation & Association
Regression & Prediction
Cautions in Data Analysis
Probability Basics
Conditional & Applied Probability
100

What does the correlation coefficient r measure between two quantitative variables?

The strength and direction of a linear relationship

100

In simple linear regression, which variable is predicted?

The response variable (y).

100

What is extrapolation?

Predicting outside the observed x-range.

100

Formula for relative frequency?

(# event occurrences) / (total trials).

100

Interpret P(A | B) in words.

“The probability of A given B.”

200

If r = 0.95, describe the association.

Strong positive linear association.

200

Define a residual.

Residual = observed y − predicted ŷ.

200

What is a regression outlier?

A point far from the trend of the rest of the data.

200

Valid range of a probability?

From 0 to 1, inclusive.

200

 In symbols, what does the vertical slash “|” mean?

“Given.”

300

If r is near 0, what does that indicate?

Weak or no linear relationship (may still be non-linear).

300

What does a large residual suggest about an observation?

 It’s unusual or far from the fitted line.

300

Define Simpson’s Paradox.

The direction of an association reverses when a third variable is considered.

300

When are two events independent?

When the outcome of one does not affect the other.

300

When are events disjoint (mutually exclusive)?

When they have no outcomes in common.

400

True or False: A high correlation implies causation.

False—correlation ≠ causation.

400

Which line minimizes the sum of squared residuals?

The least squares regression line.

400

What is a lurking variable?

An unmeasured variable that influences the relationship.

400

General addition rule: P(A or B) = ?

P(A) + P(B) − P(A and B).

400

 If P(A) = 0.4 and P(B) = 0.5 and A, B are independent, find P(A and B).

0.20

500

Name one limitation of correlation.

It only captures linear relationships and is sensitive to outliers.

500

The slope (b) of the least-squares line is directly related to which statistic, and what else influences the intercept?

Correlation (r) relates to slope; intercept depends on slope (and the means of x and y).

500

Difference between lurking and confounding variables?

Lurking: not measured; Confounding: measured but entangled with other explanatory variables.

500

Multiplication rule for independent A and B?

P(A and B) = P(A) × P(B).

500

Define a probability model.

A specification of possible outcomes and assumptions/probabilities for events in the sample space.