Advanced Stats Jeopardy

Regression Basics

Model Diagnostics

Advanced Regression

Transformations & Interpretation

Nonparametric & Dimension Reduction

100

This measure ranges from 0 to 1 and tells you what proportion of variance in Y is explained by your model.

What is R-squared?

100

This occurs when two or more independent variables in your regression are highly correlated with each other.

What is multicollinearity?

100

When X1's effect on Y depends on the value of X2, you include this type of term, written in R as X1:X2 or within X1*X2.

What is an interaction effect (or interaction term)?

100

This unit free measure represents the percentage change in Y for a 1% change in X, and is obtained from a log-log model.

What is elasticity?

100

What technique reduces dimensionality by creating new variables that are linear combinations of original variables?

What is Principal Components Analysis?

200

In a simple linear regression, this is the value of Y when all X variables equal zero, represented by the point where the line crosses the Y-axis.

What is the intercept?

200

Calculate this for each predictor; values above 5 (or 10) indicate problematic multicollinearity.

What is Variance Inflation Factor?

200

This technique adds a penalty term to the loss function to prevent overfitting by shrinking coefficient values toward zero.

What is regularization?

200

What transformation's can you apply to an advertising variable to capture diminishing returns while ensuring the effect never turns negative.

What is log(AdSpending) or sqrt(AdSpending)?

200

This graph displays eigenvalues in descending order; the "elbow" where the curve levels off suggests how many components to retain.

What is a scree plot?

300

We assess significance of a slope by checking if this value is below 0.05, or if the confidence interval excludes this number.

What is the p-value and what is zero?

300

This violation occurs when the variance of residuals is not constant across all levels of X, creating a funnel shape in residual plots.

What is heteroscedasticity?

300

This regularization method uses an L1 penalty and can shrink coefficients exactly to zero, performing automatic variable selection.

What is Lasso?

300

In a quadratic model Y = β₀ + β₁X + β₂X², you get an inverted U-shape when β₁ has this sign and β₂ has this sign.

What is positive for β₁ and negative for β₂? (β₁ > 0, β₂ < 0)

300

In nonparametric regression, this parameter controls the size of the local window.

What is bandwidth?

400

When the number of parameters equals the number of observations, you can get this R-squared value, but you should never use it for forecasting because of this problem.

What is R-squared = 1, and what is overfitting?

400

What does multicollinearity inflate, while heteroscedasticity makes it unreliable but it doesn't bias coefficient estimates.

What are standard errors?

400

A positive interaction coefficient means these two variables do this to each other's effects, while a negative coefficient means they do this.

What is amplify (positive) and dampen (negative)?

400

In the model Ln(Y) = β₀ + β₁Ln(X) + β₂D + β₃Ln(X)×D, what is the elasticity when D=1, and this is the elasticity when D=0.

What is β₁ + β₃ (when D=1) and β₁ (when D=0)?

400

How does GAM differ from linear models and fully nonparametric models?

GAM uses flexible smooth functions (unlike linear), and GAM maintains additive structure (unlike fully nonparametric)?

500

Why does lm() fail when p > N?

The design matrix cannot be inverted and there are infinitely many solutions to this underdetermined system.

500

What calculation do we use to detect influential observations?

What is Cook's Distance?

500

What is a resampling technique used to assess significance of indirect effects because their sampling distribution is not normal?

What is data bootstrap?

500

What are the three sequential variable selection methods?

What are backward selection, forward selection, and stepwise selection?

500

Explain the bias-variance trade off in bandwidth selection.

What is small bandwidth results in low bias/high variance and large bandwidth results in high bias/low variance?