Scatterplots
Correlation
Regression Line & Prediction
Residuals
SD, r^2, Outliers
100

Temp.     Ice Cream Sales

70.         150

75           160

80           180

85           200

90           220

95           250

80           190

70           155


Identify the explanatory and response variables. 

Temp: Explanatory

Ice cream sales: Response

100

For r=-0.87.

What is the direction and strength of the linear trend of the data?

Direction: Negative

Strength: Fairly strong. 

100

In the LSRL equation, "y-hat" represents which of the following:

A. The actual response variable. 

B. The actual explanatory variable. 

C. The predicted response variable. 

D. The predicted explanatory variable. 

C. The predicted response variable. 

"y-hat" ALWAYS gives us the value predicted from the LSRL equation. The response variable is the output variable (outcome of explanatory). 

100

What is the formula to calculate the RESIDUAL at a particular value of the explanatory variable?

Residual = Actual - Predicted

100

Minutes since class began vs. Hunger Level

y-hat = 4 + 5x

We are given the following:

r2 = 0.75. 

s = 4.5

Interpret the coefficient of determination in its context.

About 75% of the variability in the hunger level is accounted for by the LSRL. 

200

A scatterplot that appears to go up and right with a few outliers will be described to have with of the following directions:

A. Positive

B. Negative

C. Neither positive, nor negative. Just a scatter.

A. Positive.

200

For r=0.2.

What is the direction and strength of the linear trend of the data?

Direction: Positive

Strength: Weak

200

Minutes since class began vs. Hunger Level

y-hat = 4 + 5x

What is the predicted hunger level after 20 minutes since class began?

y-hat = 4 + (5*20) = 104

Hunger level of 104! Who would say this feels about right?

200

Minutes since class began vs. Hunger Level

y-hat = 4 + 5x

The predicted Hunger Level after 10 minutes is 54. 

However, the actual Hunger Level after 10 minutes is 70. 

Interpret the residual after 10 minutes in its context. 

The actual Hunger level for 10 minutes was 16 greater than the predicted. 
200

Minutes since class began vs. Hunger Level

y-hat = 4 + 5x

We are given the following:

r2 = 0.75. 

s = 4.5

Interpret the standard deviation of the residuals in its context.

The actual Hunger level is typically about 4.5 away from the level predicted by the LSRL. 

300

True or False. 

To make a proper scatterplot: after plotting the data points, you DON'T connect the dots. 

True!

We never connect the dots! We're looking to see if, overall, the dots are following some identifiable form (linear, exponential, parabolic, etc). If we connect the dots, we may observe an un-identifiable curve!

300

The addition of WHAT new point would STRENGTHEN the correlation coefficient?

A point that follows the linear trend of the data. 

"Outliers in the pattern strengthen r."

300

Minutes since class began vs. Hunger Level

y-hat = 4 + 5x

Interpret the Y-INTERCEPT in this context.

At 0 minutes, the hunger level is 4. 

300

Residual plot with the following characteristic:

there is a clear positive-negative-positive pattern among the residuals. 

"Linear" is a GOOD or BAD form to model the data?

BAD. 

Any pattern that can be assessed from the residual plot is an indicator that the data does not follow a linear form well. 

300

A good LSRL has:

A. High s, High r2

B. High s, Low r2

C. Low s, Low r2

D. Low s, High r2

D. Low s, High r2

- We want the typical distances between the actual and predicted y-values to be small
- We want MOST of the variability in the y-values to be accounted for by the LSRL.
400

True or False. We confirm the strength of the linear trend of a scatterplot using r2

FALSE. r2 informs us on the percent variability that is accounted for by the LSRL. r informs us on the strength of the linear trend of data points. 

400

True or False. 

r makes a distinction between explanatory and response variables. 

FALSE. Looking at our formula, we can change the axes of the variables and get the same value for r!

400

Minutes since class began vs. Hunger Level

y-hat = 4 + 5x

Interpret the SLOPE in this context.

For every increase in minute by 1, the hunger level increases by 5. 

400

Residual plot with the following characteristic:

pretty symmetrically distributed, tending to cluster towards the middle of the plot (as if someone did splatter paint). 

"Linear" is a GOOD or BAD form to model the data?

GOOD!

The messier (less discernible pattern), the better! As long as the values are decently close to 0. 

400

Which outliers lift or pull down the LSRL from the center?

A. Horizontal

B. Vertical

B. Vertical. 

- Slope stays the same

- y-intercept shifts up or down

- correlation decreases.

500

True or False. 

Suppose a scatterplot shows data that curves out in an exponential form. 

This means it is not possible for r to be close 1.0. 

False!

This is why the value of r can sometimes be misleading! r can indicate a strong linear trend, that "confuses" us about the form. 

500

A few classes ago, I showed that there seemed to be a correlation between those who complete their homework on time and the students' grades in Stats. 

Given this information, if you increase your homework score will you have a higher grade in this class?

Not necessarily!

Correlation does NOT equal causation!

So it is not a given that a higher homework score will automatically cause a higher grade in the class. There are other factors that can increase (or decrease) your grade in this class. 

That being said. . .turn in your homework on time. 

500

Which of the following is extrapolation?

A. Removing outliers from the scatterplot. 

B. Predicting unknown data outside the range of a known set of data.

C. Using "y-hat" as the ACTUAL response values. 

B. Predicting unknown data outside the range of a known set of data.

We trust these predictions LESS. 

500

Suppose a set of data has the following features:

r = 0.96

Residual plot: scattered points close to the y-axis, along with a few points above the y-axis. There are no points below the y-axis. (draw if necessary). 

TRUE or FALSE. A line is a good fit for this data. 

FALSE. 

Even though r is very close to 1.0, there is actually a pattern to the residual plot. There is not a symmetric amount of points below and above the y-axis. This means there is a better model for this data (non-linear). 

500

Which outliers tilt the LSRL from the center?

A. Horizontal

B. Vertical

B. Horizontal. 

- Slope decreases

- y-intercept increases

- correlation decreases.

M
e
n
u