The term for a variable that drives both the factor of interest AND the outcome, creating a spurious association
What is a confounder?
What the coefficient on "advertising" of 28.312 means in the Click Pens regression
What is each additional advertising spot is associated with 28.312 more units sold, holding salesreps constant?
A model with AUC = 0.5 has this property when it tries to rank a random buyer against a random non-buyer
What is it performs no better than random — it has no discrimination power?
This is why you must StandardScaler inputs before fitting an MLP but do NOT need to for Random Forest or XGBoost
What is neural networks use gradient-based optimization sensitive to feature scale, while tree-based models split on rank order and are scale-invariant?
The formula for breakeven response rate and what it represents in a targeting decision
What is cost per contact / margin per response — the minimum predicted probability at which contacting a customer is profitable?
In the Google Ads example, this was identified as the confound that caused the analysis to fail the causality checklist
What is consumer interest in buying a car (Google search terms)?
Why Adjusted R² is preferred over R² when comparing models with different numbers of variables
What is Adjusted R² penalizes for adding extra predictors, preventing artificially inflated fit from irrelevant variables?
The gains plot sits exactly on the diagonal for a model evaluated on test data. This is what you conclude about the model
What is it has no predictive power — it performs equivalently to randomly contacting customers?
In k-fold cross-validation with k=5, this is exactly what happens in each of the 5 rounds and why the final hyperparameters are then re-fit on all training data
What is the training data is split into 5 folds — in each round, 4 folds train the model and 1 fold validates it. After averaging performance across all 5 rounds per hyperparameter combination, the best params are selected and the model is re-fit on 100% of training data before evaluating once on the test set?
In the Wave 2 Intuit scenario, you multiply Wave 1 predictions by 0.5 before applying the breakeven rule. This is the managerial reason why Wave 1 predictions overstate Wave 2 response rates
What is Wave 1 already contacted the most responsive customers — those remaining in Wave 2 are systematically less likely to respond, so raw Wave 1 predictions would lead you to over-target?
OVB can cause a coefficient to do this, which is exactly what happened to clarity in the diamonds dataset when carat was omitted
What is flip sign (change from negative to positive)?
In a log-log regression where ln(price) is regressed on ln(carat), the coefficient on ln(carat) has this specific economic interpretation
What is elasticity — a 1% increase in carat is associated with a β% increase in price?
This is why we never tune hyperparameters by evaluating performance on the test set directly
What is it would leak information from the test set into the model selection process, making test performance an optimistic and unreliable estimate of true out-of-sample performance?
This is the key conceptual difference between how Random Forest and XGBoost each reduce prediction error, stated in terms of what each one targets
What is Random Forest reduces variance by averaging many large independently-built trees, while XGBoost reduces bias by sequentially fitting small trees to the residuals of prior trees?
This is the fundamental reason why a propensity-to-buy model is the wrong tool for ad targeting, even if it has high AUC
What is a propensity model ranks customers by their likelihood of buying regardless of the ad — it will prioritize "Sure Things" who would buy anyway, wasting budget, and miss "Persuadables" who only buy because of the ad. Uplift modeling targets the incremental effect of treatment, not overall purchase likelihood?
Multicollinearity and OVB are often confused, but they differ in this critical way — one biases your coefficients, the other does not
What is OVB biases coefficients when the omitted variable is correlated with both X and Y, while multicollinearity (when all related variables ARE included) only inflates standard errors without biasing estimates?
A researcher runs two regressions: first regressing sales on advertising only (β=28.3), then adding salesreps (advertising β changes dramatically). The F-statistic is significant in both models. This is what you should conclude and why the significant F-test alone is not sufficient
What is the first model suffers from OVB — salesreps were correlated with advertising and had a direct effect on sales. A significant F-test only tells you the model as a whole is better than nothing, not that the individual coefficients are unbiased?
A random forest has near-perfect training AUC but mediocre test AUC. An unpruned decision tree has perfect training AUC but near-random test AUC. This is what distinguishes the two situations in terms of the bias-variance tradeoff
What is both are overfitting (high variance), but the random forest mitigates it through averaging many trees on random subsamples, while a single unpruned tree has no such correction — making the tree's test performance collapse far more severely?
In the Facebook task, NN(1,) and NN(2,) produce different permutation importance plots. This is the precise reason adding a second hidden node changes what the model can capture, explained in terms of what a single node can and cannot do
What is a single TanH node can only model a linear combination of inputs (essentially logistic regression), while a second node allows the network to combine non-linear transformations — enabling it to capture interaction effects between variables like age and ad type that a single node cannot represent?
In a CLV calculation, switching from the "optimistic" churn timing assumption to the "pessimistic" one changes the CLV in this direction and for this reason
What is CLV decreases under pessimistic timing because customers are assumed to churn at the start of each period (before paying) rather than the end, reducing the expected number of payments received and the present value of each cash flow?
A company observes that customers who receive more emails buy more. They conclude email frequency causes higher purchases. Even without an obvious confounder, this causal claim could still fail for this more subtle reason — and this is what the data pattern would look like
What is reverse causality — engaged buyers may prompt the firm's algorithm to send them more emails, so purchasing behavior drives email frequency rather than the other way around. The data would show high correlation but the direction of causation runs opposite to the claim?
Two models predict diamond prices. Model A uses only clarity (R²=0.031). Model B uses carat and clarity (R²=0.904). A student argues the large jump in R² proves carat causes higher prices. This is the flaw in that reasoning, and this is what the R² jump actually tells you
What is R² measures predictive fit, not causation — the jump tells you carat explains most of the variance in price and that clarity's earlier coefficient was severely biased by OVB, but none of it establishes a causal direction. Carat could itself be endogenous (e.g., people selectively certify larger, higher-quality stones)?
A model achieves high accuracy (e.g., 95%) on a classification task but is actually useless for targeting. This is the exact condition that makes accuracy a misleading metric here, and this is the better metric to use instead
What is severe class imbalance — if only 5% of customers buy, predicting "no" for everyone achieves 95% accuracy while identifying zero buyers. AUC (or profit/ROME at the breakeven threshold) is the appropriate metric because it evaluates discrimination across all thresholds regardless of class distribution?
A student trains an XGBoost model and notices that training AUC is 0.97 but test AUC is 0.71. They try reducing n_estimators and the gap narrows slightly, but test AUC barely improves. This is the likely explanation and the hyperparameter(s) they should actually prioritize tuning
What is the model is overfitting primarily due to tree depth and minimum child weight — reducing n_estimators helps only marginally because the individual trees are still too complex. Reducing max_depth and increasing min_child_weight forces shallower, simpler trees which have a much larger effect on closing the train-test gap than cutting the number of trees alone?
A firm uses an uplift model and the incremental uplift plot peaks at 40% of customers then declines to zero at 80%. A colleague argues you should always target the top 40% since that's the peak. This is why that reasoning is incomplete, and this is the correct targeting rule
What is the peak represents the decile with the highest average incremental uplift, but every customer between 0% and the breakeven crossing point (where incremental uplift > cost/revenue) is still profitable to contact — stopping at the peak leaves profitable Persuadables untargeted. The correct rule is to contact all customers up to the point where the incremental uplift line crosses the breakeven threshold, not where it peaks?