You are the lead data scientist at a financial services firm developing a regression model to predict customer lifetime value (CLV) for targeted marketing investments.
The model will determine which customers receive high-cost retention offers.
Business Context:
Dataset: 85,000 customers
Target: Continuous CLV (next 12 months)
Cost of false high prediction: Overspending on low-value customers
Cost of false low prediction: Losing high-value customers
Executives want a deployable model within 2 weeks.
Model: Multiple Linear Regression (with interactions)
R² (Training) = 0.82
R² (Validation) = 0.61
RMSE (Validation) = $420
Adjusted R² = 0.80
Cross-Validation RMSE Range = $390–$610
Diagnostics:
Residual Plot: Funnel shape (increasing variance)
VIF Values: 6.5–9.8 for 3 predictors
Residuals: Slight right skew
Outliers: 2% high-leverage observations
Autocorrelation: Not present (Durbin-Watson ≈ 2.1)
Ethical & Operational Notes:
Income variable is included (highly predictive)
ZIP code is also included
Model is highly interpretable
A random forest model has:
Lower RMSE ($360)
No interpretability
Longer deployment timeline
Question:
Should the company deploy the regression model, switch to the random forest, or delay deployment?
Your response must justify the decision using:
Predictive performance
Assumption violations
Business risk
Ethical considerations
Model interpretability
Teams have 10 minutes to discuss and write their response on the board.
Deploy the regression model with mitigation steps and monitoring, while planning a future comparison with the random forest.