Causality
Regression Interpretation
Model Evaluation & The Bias-Variance Tradeoff
ML Models — Concepts & Mechanics
Targeting, CLV & Uplift
100

The term for a variable that drives both the factor of interest AND the outcome, creating a spurious association

What is a confounder?

100

What the coefficient on "advertising" of 28.312 means in the Click Pens regression

What is each additional advertising spot is associated with 28.312 more units sold, holding salesreps constant?

100

A model with AUC = 0.5 has this property when it tries to rank a random buyer against a random non-buyer

What is it performs no better than random — it has no discrimination power?

100

This is why you must StandardScaler inputs before fitting an MLP but do NOT need to for Random Forest or XGBoost

What is neural networks use gradient-based optimization sensitive to feature scale, while tree-based models split on rank order and are scale-invariant?

100

The formula for breakeven response rate and what it represents in a targeting decision

What is cost per contact / margin per response — the minimum predicted probability at which contacting a customer is profitable?

200

In the Google Ads example, this was identified as the confound that caused the analysis to fail the causality checklist

What is consumer interest in buying a car (Google search terms)?

200

Why Adjusted R² is preferred over R² when comparing models with different numbers of variables

What is Adjusted R² penalizes for adding extra predictors, preventing artificially inflated fit from irrelevant variables?

200

The gains plot sits exactly on the diagonal for a model evaluated on test data. This is what you conclude about the model

What is it has no predictive power — it performs equivalently to randomly contacting customers?

200

In k-fold cross-validation with k=5, this is exactly what happens in each of the 5 rounds and why the final hyperparameters are then re-fit on all training data

What is the training data is split into 5 folds — in each round, 4 folds train the model and 1 fold validates it. After averaging performance across all 5 rounds per hyperparameter combination, the best params are selected and the model is re-fit on 100% of training data before evaluating once on the test set?

200

In the Wave 2 Intuit scenario, you multiply Wave 1 predictions by 0.5 before applying the breakeven rule. This is the managerial reason why Wave 1 predictions overstate Wave 2 response rates

What is Wave 1 already contacted the most responsive customers — those remaining in Wave 2 are systematically less likely to respond, so raw Wave 1 predictions would lead you to over-target?

300

OVB can cause a coefficient to do this, which is exactly what happened to clarity in the diamonds dataset when carat was omitted

What is flip sign (change from negative to positive)?

300

In a log-log regression where ln(price) is regressed on ln(carat), the coefficient on ln(carat) has this specific economic interpretation

What is elasticity — a 1% increase in carat is associated with a β% increase in price?

300

This is why we never tune hyperparameters by evaluating performance on the test set directly

What is it would leak information from the test set into the model selection process, making test performance an optimistic and unreliable estimate of true out-of-sample performance?

300

This is the key conceptual difference between how Random Forest and XGBoost each reduce prediction error, stated in terms of what each one targets

What is Random Forest reduces variance by averaging many large independently-built trees, while XGBoost reduces bias by sequentially fitting small trees to the residuals of prior trees?

300

This is the fundamental reason why a propensity-to-buy model is the wrong tool for ad targeting, even if it has high AUC

What is a propensity model ranks customers by their likelihood of buying regardless of the ad — it will prioritize "Sure Things" who would buy anyway, wasting budget, and miss "Persuadables" who only buy because of the ad. Uplift modeling targets the incremental effect of treatment, not overall purchase likelihood?

400

Multicollinearity and OVB are often confused, but they differ in this critical way — one biases your coefficients, the other does not

What is OVB biases coefficients when the omitted variable is correlated with both X and Y, while multicollinearity (when all related variables ARE included) only inflates standard errors without biasing estimates?

400

A researcher runs two regressions: first regressing sales on advertising only (β=28.3), then adding salesreps (advertising β changes dramatically). The F-statistic is significant in both models. This is what you should conclude and why the significant F-test alone is not sufficient

What is the first model suffers from OVB — salesreps were correlated with advertising and had a direct effect on sales. A significant F-test only tells you the model as a whole is better than nothing, not that the individual coefficients are unbiased?

400

A random forest has near-perfect training AUC but mediocre test AUC. An unpruned decision tree has perfect training AUC but near-random test AUC. This is what distinguishes the two situations in terms of the bias-variance tradeoff

What is both are overfitting (high variance), but the random forest mitigates it through averaging many trees on random subsamples, while a single unpruned tree has no such correction — making the tree's test performance collapse far more severely?

400

In the Facebook task, NN(1,) and NN(2,) produce different permutation importance plots. This is the precise reason adding a second hidden node changes what the model can capture, explained in terms of what a single node can and cannot do

What is a single TanH node can only model a linear combination of inputs (essentially logistic regression), while a second node allows the network to combine non-linear transformations — enabling it to capture interaction effects between variables like age and ad type that a single node cannot represent?

400

In a CLV calculation, switching from the "optimistic" churn timing assumption to the "pessimistic" one changes the CLV in this direction and for this reason

What is CLV decreases under pessimistic timing because customers are assumed to churn at the start of each period (before paying) rather than the end, reducing the expected number of payments received and the present value of each cash flow?

500

A company observes that customers who receive more emails buy more. They conclude email frequency causes higher purchases. Even without an obvious confounder, this causal claim could still fail for this more subtle reason — and this is what the data pattern would look like

What is reverse causality — engaged buyers may prompt the firm's algorithm to send them more emails, so purchasing behavior drives email frequency rather than the other way around. The data would show high correlation but the direction of causation runs opposite to the claim?

500

Two models predict diamond prices. Model A uses only clarity (R²=0.031). Model B uses carat and clarity (R²=0.904). A student argues the large jump in R² proves carat causes higher prices. This is the flaw in that reasoning, and this is what the R² jump actually tells you

What is R² measures predictive fit, not causation — the jump tells you carat explains most of the variance in price and that clarity's earlier coefficient was severely biased by OVB, but none of it establishes a causal direction. Carat could itself be endogenous (e.g., people selectively certify larger, higher-quality stones)?

500

A model achieves high accuracy (e.g., 95%) on a classification task but is actually useless for targeting. This is the exact condition that makes accuracy a misleading metric here, and this is the better metric to use instead

What is severe class imbalance — if only 5% of customers buy, predicting "no" for everyone achieves 95% accuracy while identifying zero buyers. AUC (or profit/ROME at the breakeven threshold) is the appropriate metric because it evaluates discrimination across all thresholds regardless of class distribution?

500

A student trains an XGBoost model and notices that training AUC is 0.97 but test AUC is 0.71. They try reducing n_estimators and the gap narrows slightly, but test AUC barely improves. This is the likely explanation and the hyperparameter(s) they should actually prioritize tuning

What is the model is overfitting primarily due to tree depth and minimum child weight — reducing n_estimators helps only marginally because the individual trees are still too complex. Reducing max_depth and increasing min_child_weight forces shallower, simpler trees which have a much larger effect on closing the train-test gap than cutting the number of trees alone?

500

A firm uses an uplift model and the incremental uplift plot peaks at 40% of customers then declines to zero at 80%. A colleague argues you should always target the top 40% since that's the peak. This is why that reasoning is incomplete, and this is the correct targeting rule

What is the peak represents the decile with the highest average incremental uplift, but every customer between 0% and the breakeven crossing point (where incremental uplift > cost/revenue) is still profitable to contact — stopping at the peak leaves profitable Persuadables untargeted. The correct rule is to contact all customers up to the point where the incremental uplift line crosses the breakeven threshold, not where it peaks?