A regression model shows strong validation RMSE but unstable coefficients across folds. What is the most likely concern?
A. Model underfitting due to high bias
B. High variance caused by predictor instability
C. Violated normality of residuals
D. Excessive regularization penalties
B. High variance caused by predictor instability
Which assumption violation most directly impacts prediction intervals rather than point predictions?
A. Nonlinearity in predictor relationships
B. Heteroscedasticity of residual variance
C. Mild multicollinearity among predictors
D. Slight deviations from normality
B. Heteroscedasticity of residual variance
Why is cross-validation preferred over a single train-test split in predictive regression?
A. It increases coefficient significance
B. It stabilizes estimates of generalization error
C. It guarantees higher R² values
D. It eliminates model variance
B. It stabilizes estimates of generalization error
In predictive modeling, when might a regularized regression outperform a neural network?
A. When nonlinear structure dominates strongly
B. When data is small and signal-to-noise is moderate
C. When features are unstructured text
D. When interactions are highly complex
B. When data is small and signal-to-noise is moderate
A regression model excludes protected variables but includes correlated proxies. What is the ethical risk?
A. Reduced model predictive accuracy
B. Indirect algorithmic bias through proxy effects
C. Increased coefficient variance
D. Lower model interpretability
B. Indirect algorithmic bias through proxy effects
In a predictive context, why is extrapolation a major limitation of linear regression?
A. Predictions become biased within training range
B. Coefficients lose statistical significance
C. Relationships outside observed data may not hold linearly
D. Residual variance becomes normally distributed
C. Relationships outside observed data may not hold linearly
In large datasets, why is the independence assumption still critical for predictive deployment?
A. It affects coefficient interpretability only
B. It ensures unbiased residual standard errors
C. It prevents artificially inflated performance metrics
D. It guarantees homoscedastic residual patterns
C. It prevents artificially inflated performance metrics
A high adjusted R² but poor out-of-sample performance most strongly suggests:
A. Model underfitting due to simplicity
B. Data leakage during model training
C. Overfitting to training data patterns
D. Violated homoscedasticity assumption
C. Overfitting to training data patterns
Why might executives prefer regression over a black-box ensemble model?
A. Regression provides transparent and explainable outputs
B. Regression requires fewer observations
C. Regression always has higher accuracy
D. Regression eliminates ethical concerns
A. Regression provides transparent and explainable outputs
Why is fairness auditing important in predictive regression deployment?
A. It increases model complexity
B. It ensures coefficients remain stable
C. It improves residual normality
D. It identifies disparate impact across subgroups
D. It identifies disparate impact across subgroups
A model with many interaction terms performs well in training but poorly in testing. What is the key limitation illustrated?
A. Insufficient model flexibility
B. Excessive model complexity causing overfitting
C. Violated independence assumption
D. Incorrect response variable scaling
B. Excessive model complexity causing overfitting
If residuals show autocorrelation in time-based data, what is the primary predictive risk?
A. Overestimated generalization performance
B. Increased model interpretability
C. Reduced coefficient magnitude
D. Improved forecast stability
A. Overestimated generalization performance
When comparing two regression models with similar RMSE, which metric best informs business deployment?
A. Residual normality test results
B. Coefficient p-values only
C. Total number of predictors included
D. Performance stability across validation folds
D. Performance stability across validation folds
In model comparison, what is the strongest justification for selecting regression over a more accurate model?
A. Lower computational requirements only
B. Higher coefficient statistical significance
C. Regulatory need for interpretability and auditability
D. Simpler data preprocessing requirements
C. Regulatory need for interpretability and auditability
In high-stakes domains (e.g., lending, healthcare), what is the primary ethical advantage of regression?
A. Higher predictive accuracy than all models
B. Ability to explain decisions and justify outcomes
C. Elimination of sampling bias
D. Automatic compliance with regulations
B. Ability to explain decisions and justify outcomes
Why can regression underperform compared to tree-based models in complex business environments?
A. Regression requires larger sample sizes
B. Regression cannot model categorical variables
C. Regression struggles with nonlinear and interaction effects
D. Regression eliminates multicollinearity automatically
C. Regression struggles with nonlinear and interaction effects
A model violates linearity but has superior cross-validated accuracy. What is the most defensible action?
A. Reject due to theoretical violation
B. Transform predictors regardless of performance
C. Deploy while documenting model limitations
D. Remove nonlinear predictors entirely
C. Deploy while documenting model limitations
A funnel-shaped residual plot indicates what predictive concern?
A. Multicollinearity among predictors
B. Non-constant error variance across predictions
C. Perfect model generalization
D. Excessively low model bias
B. Non-constant error variance across predictions
When does model interpretability become a predictive risk rather than an advantage?
A. When stakeholders demand transparency
B. When simpler models miss critical nonlinear structure
C. When validation metrics are stable
D. When coefficients are statistically significant
B. When simpler models miss critical nonlinear structure
A highly accurate regression model systematically underpredicts for one demographic group. What is the most appropriate response?
A. Conduct bias analysis and recalibrate the model
B. Remove demographic variables entirely
C. Ignore due to overall accuracy
D. Increase model complexity immediately
A. Conduct bias analysis and recalibrate the model