Terminology
Notation
Scatterplots & Patterns
Correlation & Regression
Residuals
Predictions & Causation
100

This is the name for data that involves two variables measured on the same subject.

What is 'Bivariate Data'?

100

This letter is the standard notation used to represent the correlation coefficient.

What is 'r'?

100

On a scatterplot, the explanatory variable is always plotted on this axis.

What is the 'x (horizontal) axis'?

100

The value of r always falls within this range.

What is '−1 ≤ r ≤ 1'?

100

A residual is calculated as this _________, the difference between the observed and predicted y values.

What is 'y − ŷ' (observed minus predicted)?

100

A linear model is fitted to data collected for x values between 10 and 50. Using it to predict y, when x = 80 is an example of this.

What is 'extrapolation'?

200

The variable that is used to explain or predict the other variable is called this.

What is the 'Explanatory variable'?

200

This notation represents the coefficient of determination, found by squaring the correlation coefficient.

What is 'r²'?

200

A scatterplot where points trend upward from left to right shows an association of this direction.

What is a 'positive' association?

200

An r value of −0.52 indicates an association of this strength and direction.

What is a 'moderate, negative, linear association'?

200

When a residual plot shows a random scatter of points around zero, this conclusion can be drawn about the linear model.

What is that a 'linear model is appropriate for the data'?

200

A strong correlation between shoe size and reading ability in primary school children is likely due to this non-causal explanation.

What is 'a common response to another variable ('age', 'year level'), i.e. confounding'?

300

The variable whose value is being predicted or explained in a bivariate dataset.

What is the 'Response variable'?

300

In the least squares regression equation ŷ = ax + b, this letter represents the slope.

What is 'a'?

300

When points on a scatterplot follow a straight-line pattern, the form of the association is described as this.

What is 'linear'?

300

The slope in the equation ŷ = 5 + 2.3x means that for every 1-unit increase in x, the predicted y value changes by this amount.

What is an 'increase of 2.3 units'?

300

A residual plot that shows a clear curved pattern suggests the fitted linear model has this problem.

What is that the 'linear model is not appropriate'?

300

Using the equation ŷ = 1.5x + 3, predict the value of y, when x = 8.

What is 'ŷ = 1.5(8) + 3 = 15'?

400

This term describes using a fitted line to predict values within the range of the observed data.

What is 'Interpolation'?

400

In the equation ŷ = ax + b, this letter represents the y-intercept of the fitted line.

What is 'b'?

400

Describe the association shown when points on a scatterplot are widely scattered with no clear direction.

What is 'no association' (or correlation)?

400

The line of best fit is also known as this.

What is the 'least squares regression line'.

400

An r² value of 0.81 means this percentage of the variation in y is explained by the linear relationship with x.

What is 81%?

400

Two variables have a strong correlation purely by chance, with no logical connection between them. This non-causal explanation is called this.

What is 'coincidence'?

600

This term describes using a fitted line to predict values outside the range of the observed data, which carries significant risk.

What is 'Extrapolation'?

600

In the regression equation ŷ = a + bx, this symbol represents the predicted value of the response variable.

What is 'ŷ' (y-hat)?

600

A scatterplot shows study hours on the x-axis and test scores on the y-axis. Points cluster tightly around an downward line. Describe this association fully.

What is a 'strong, negative, linear' association?

600

The y-intercept (b = 12) in a regression equation, represents this, in context.

What is the 'predicted value of y, when x = 0'?

600

Use the graph to determine whether a linear model would be appropriate.

What is that the 'linear model is not appropriate'.

600

A study finds that countries with more computers per capita have longer life expectancies. Rather than computers causing long life, a more plausible explanation involves this.

What is 'a confounding variable — wealthier countries have both more computers and better healthcare'?

800

A third variable that is related to both the explanatory and response variables, and may explain an observed association, is called this.

What is a 'confounding variable' (or 'lurking variable')?

800

In the least-squares line equation ŷ = ax + b, this is the complete notation used to represent the fitted line, where y is the response variable and x is the explanatory variable.

What is 'ŷ = ax + b, where b is the y-intercept and a is the slope'?

800

A scatterplot of temperature vs. ice cream sales shows points curving upward steeply. The form of this association is described as this.

What is 'non-linear'?

800

A student calculates r = 1.4 for a dataset. Explain what this tells you about the calculation.

What is 'that an error has been made, because r must be between −1 and 1'?

800

If r = −0.7, calculate r² and state what it tells you about the strength of the linear model.

What is 'r² = 0.49, meaning 49% of the variation in y is explained by the variation in x'?

800

Just because two variables are strongly correlated does not necessarily mean that one causes the other. This principle is summarised as this phrase.

What is 'correlation does not imply causation'?