Final Jeopardy
1000

You are the lead data scientist at a financial services firm developing a regression model to predict customer lifetime value (CLV) for targeted marketing investments.

The model will determine which customers receive high-cost retention offers.

Business Context:

  • Dataset: 85,000 customers

  • Target: Continuous CLV (next 12 months)

  • Cost of false high prediction: Overspending on low-value customers

  • Cost of false low prediction: Losing high-value customers

Executives want a deployable model within 2 weeks.

Model: Multiple Linear Regression (with interactions)

  • R² (Training) = 0.82

  • R² (Validation) = 0.61

  • RMSE (Validation) = $420

  • Adjusted R² = 0.80

  • Cross-Validation RMSE Range = $390–$610

Diagnostics:

  • Residual Plot: Funnel shape (increasing variance)

  • VIF Values: 6.5–9.8 for 3 predictors

  • Residuals: Slight right skew

  • Outliers: 2% high-leverage observations

  • Autocorrelation: Not present (Durbin-Watson ≈ 2.1)

Ethical & Operational Notes:

  • Income variable is included (highly predictive)

  • ZIP code is also included

  • Model is highly interpretable

  • A random forest model has:

  • Lower RMSE ($360)

  • No interpretability

  • Longer deployment timeline

Question:

Should the company deploy the regression model, switch to the random forest, or delay deployment?

Your response must justify the decision using:

  • Predictive performance

  • Assumption violations

  • Business risk

  • Ethical considerations

  • Model interpretability

Teams have 10 minutes to discuss and write their response on the board.

Deploy the regression model with mitigation steps and monitoring, while planning a future comparison with the random forest.

M
e
n
u