Temp. Ice Cream Sales
70. 150
75 160
80 180
85 200
90 220
95 250
80 190
70 155
Identify the explanatory and response variables.
Temp: Explanatory
Ice cream sales: Response
For r=-0.87.
What is the direction and strength of the linear trend of the data?
Direction: Negative
Strength: Fairly strong.
In the LSRL equation, "y-hat" represents which of the following:
A. The actual response variable.
B. The actual explanatory variable.
C. The predicted response variable.
D. The predicted explanatory variable.
C. The predicted response variable.
"y-hat" ALWAYS gives us the value predicted from the LSRL equation. The response variable is the output variable (outcome of explanatory).
What is the formula to calculate the RESIDUAL at a particular value of the explanatory variable?
Residual = Actual - Predicted
Minutes since class began vs. Hunger Level
y-hat = 4 + 5x
We are given the following:
r2 = 0.75.
s = 4.5
Interpret the coefficient of determination in its context.
About 75% of the variability in the hunger level is accounted for by the LSRL.
A scatterplot that appears to go up and right with a few outliers will be described to have with of the following directions:
A. Positive
B. Negative
C. Neither positive, nor negative. Just a scatter.
A. Positive.
For r=0.2.
What is the direction and strength of the linear trend of the data?
Direction: Positive
Strength: Weak
Minutes since class began vs. Hunger Level
y-hat = 4 + 5x
What is the predicted hunger level after 20 minutes since class began?
y-hat = 4 + (5*20) = 104
Hunger level of 104! Who would say this feels about right?
Minutes since class began vs. Hunger Level
y-hat = 4 + 5x
The predicted Hunger Level after 10 minutes is 54.
However, the actual Hunger Level after 10 minutes is 70.
Interpret the residual after 10 minutes in its context.
Minutes since class began vs. Hunger Level
y-hat = 4 + 5x
We are given the following:
r2 = 0.75.
s = 4.5
Interpret the standard deviation of the residuals in its context.
The actual Hunger level is typically about 4.5 away from the level predicted by the LSRL.
True or False.
To make a proper scatterplot: after plotting the data points, you DON'T connect the dots.
True!
We never connect the dots! We're looking to see if, overall, the dots are following some identifiable form (linear, exponential, parabolic, etc). If we connect the dots, we may observe an un-identifiable curve!
The addition of WHAT new point would STRENGTHEN the correlation coefficient?
A point that follows the linear trend of the data.
"Outliers in the pattern strengthen r."
Minutes since class began vs. Hunger Level
y-hat = 4 + 5x
Interpret the Y-INTERCEPT in this context.
At 0 minutes, the hunger level is 4.
Residual plot with the following characteristic:
there is a clear positive-negative-positive pattern among the residuals.
"Linear" is a GOOD or BAD form to model the data?
BAD.
Any pattern that can be assessed from the residual plot is an indicator that the data does not follow a linear form well.
A good LSRL has:
A. High s, High r2
B. High s, Low r2
C. Low s, Low r2
D. Low s, High r2
D. Low s, High r2
- We want the typical distances between the actual and predicted y-values to be smallTrue or False. We confirm the strength of the linear trend of a scatterplot using r2
FALSE. r2 informs us on the percent variability that is accounted for by the LSRL. r informs us on the strength of the linear trend of data points.
True or False.
r makes a distinction between explanatory and response variables.
FALSE. Looking at our formula, we can change the axes of the variables and get the same value for r!
Minutes since class began vs. Hunger Level
y-hat = 4 + 5x
Interpret the SLOPE in this context.
For every increase in minute by 1, the hunger level increases by 5.
Residual plot with the following characteristic:
pretty symmetrically distributed, tending to cluster towards the middle of the plot (as if someone did splatter paint).
"Linear" is a GOOD or BAD form to model the data?
GOOD!
The messier (less discernible pattern), the better! As long as the values are decently close to 0.
Which outliers lift or pull down the LSRL from the center?
A. Horizontal
B. Vertical
B. Vertical.
- Slope stays the same
- y-intercept shifts up or down
- correlation decreases.
True or False.
Suppose a scatterplot shows data that curves out in an exponential form.
This means it is not possible for r to be close 1.0.
False!
This is why the value of r can sometimes be misleading! r can indicate a strong linear trend, that "confuses" us about the form.
A few classes ago, I showed that there seemed to be a correlation between those who complete their homework on time and the students' grades in Stats.
Given this information, if you increase your homework score will you have a higher grade in this class?
Not necessarily!
Correlation does NOT equal causation!
So it is not a given that a higher homework score will automatically cause a higher grade in the class. There are other factors that can increase (or decrease) your grade in this class.
That being said. . .turn in your homework on time.
Which of the following is extrapolation?
A. Removing outliers from the scatterplot.
B. Predicting unknown data outside the range of a known set of data.
C. Using "y-hat" as the ACTUAL response values.
B. Predicting unknown data outside the range of a known set of data.
We trust these predictions LESS.
Suppose a set of data has the following features:
r = 0.96
Residual plot: scattered points close to the y-axis, along with a few points above the y-axis. There are no points below the y-axis. (draw if necessary).
TRUE or FALSE. A line is a good fit for this data.
FALSE.
Even though r is very close to 1.0, there is actually a pattern to the residual plot. There is not a symmetric amount of points below and above the y-axis. This means there is a better model for this data (non-linear).
Which outliers tilt the LSRL from the center?
A. Horizontal
B. Vertical
B. Horizontal.
- Slope decreases
- y-intercept increases
- correlation decreases.