Limitations & Disadvantages of Regression
Assumptions in Predictive Modeling
Diagnostics & Evaluation
Regression vs Other Models
Ethics & Business Impact
100

What is the main limitation of linear regression when modeling nonlinear relationships?

A. It requires normally distributed residuals
B. It cannot model nonlinear patterns without transformation
C. It always produces biased predictions
D. It prevents use of interaction effects

B. It cannot model nonlinear patterns without transformation

100

Is normality of residuals essential for prediction accuracy?

A. Yes, for all regression models
B. Only when sample size is small
C. No, it is mainly needed for inference
D. Only when multicollinearity exists

C. No, it is mainly needed for inference

100

Why should R² not be used as the sole measure of predictive success?

A. It fails to account for how well the model performs on new data
B. It decreases as predictors are added to model
C. It has no relationship to fit
D. It is designed only for logistic models

A. It fails to account for how well the model performs on new data

100

When is regression typically preferred over neural networks?

A. When interpretability is required
B. When nonlinear effects dominate
C. When computation is unlimited
D. When data is unstructured

A. When interpretability is required

100

What is a key risk of biased training data?

A. Systematic discriminatory predictions
B. Reduced computational efficiency
C. Lower model convergence speed
D. Increased sampling variance

A. Systematic discriminatory predictions

200

What is a common consequence of extreme data points in regression?

A. They reduce variance in predictions
B. They typically distort slope estimates
C. They eliminate residual patterns
D. They improve model stability

B. They typically distort slope estimates

200

What is the practical risk of heteroscedasticity in prediction?

A. It inflates confidence interval estimates
B. It increases coefficient bias
C. It eliminates predictive accuracy
D. It removes model linearity

A. It inflates confidence interval estimates

200

Low training error and high validation error indicate:

A. High model variance and overfitting
B. High model bias and underfitting
C. Stable coefficient estimates
D. Proper model generalization

A. High model variance and overfitting

200

For highly nonlinear, large-scale data, which model may outperform regression?

A. Ridge regression with penalties
B. Logistic regression classifier
C. Neural networks or tree ensembles
D. Ordinary least squares models

C. Neural networks or tree ensembles

200

When should interpretability outweigh accuracy?

A. When prediction error is irrelevant
B. When legal accountability is required
C. When sample size is small
D. When model complexity is high

B. When legal accountability is required

300

What predictive issue arises when predictors are highly correlated?

A. Training error falls too quickly
B. Coefficients become sensitive to small data changes
C. The model becomes unable to converge
D. Variable importance increases improperly

B. Coefficients become sensitive to small data changes

300

If a model violates linearity but performs well in validation, what is a reasonable decision?

A. Discard the model because assumptions dominate
B. Use the model but continue checking performance
C. Force a polynomial transformation
D. Add additional noise variables

B. Use the model but continue checking performance

300

Why is RMSE more actionable for business users than R²?

A. It expresses error in the units of the outcome
B. It ensures unbiased predictions
C. It eliminates heteroscedasticity
D. It increases as interpretability increases

A. It expresses error in the units of the outcome

300

What essential tradeoff often arises when using regression models?

A. Transparency versus predictive flexibility
B. Memory usage versus speed
C. Variance versus sample size
D. Coefficient size versus standard error

A. Transparency versus predictive flexibility

300

Why are proxy variables ethically risky?

A. They reduce available predictors
B. They increase model variance
C. They indirectly encode protected attributes
D. They weaken predictive strength

C. They indirectly encode protected attributes

400

Why might analysts still use a simpler regression model despite lower accuracy?

A. It automatically handles nonlinearities
B. It is easier to justify to oversight bodies
C. It eliminates multicollinearity
D. It scales better to unstructured data

B. It is easier to justify to oversight bodies

400

In business deployment, which priority is most important?

A. Strict adherence to assumptions
B. Lowest statistical p-values
C. Maximum number of predictors
D. Strong validation performance on new data

D. Strong validation performance on new data

400

What does a visible pattern in residuals typically reveal?

A. A correctly specified linear form
B. An underlying structural issue in the model
C. Adequate predictive generalization
D. Properly distributed errors

B. An underlying structural issue in the model

400

Why might regression be preferred for rapid deployment?

A. It always outperforms machine learning
B. It simplifies stakeholder communication
C. It eliminates validation needs
D. It avoids ethical considerations

B. It simplifies stakeholder communication

400

In critical, high‑impact environments, what type of model is generally most defensible?

A. The most mathematically complex model
B. A model with high R² but limited transparency
C. A clear, interpretable model with safeguards
D. A model with maximal variable count

C. A clear, interpretable model with safeguards