DATA SCIENCE EXAM 2 SECOND JEOPARDY

LOGISTIC REGRESSION

KNN & MISCLASSIFICATION

CLUSTERING & PCA

MODEL EVALUATION

TREE MODELS & ENSEMBLES

600

Logistic regression uses this link function to connect linear predictors to probabilities.

What is the logit function (log-odds)?

600

KNN has what computational drawback on large datasets, and how can it be mitigated?

It is computationally expensive at prediction time (O(n)) — mitigate with KD-trees, ball trees, or dimensionality reduction.

600

In K-means, how are centroids recomputed after re-assignment?

By taking the mean vector of all points in each cluster for every iteration.

600

What is the main advantage of k-fold cross-validation over a simple train-test split?

Lower variance estimate of test error by averaging over k folds, using all data for both training and testing.

600

How does bagging reduce variance?

By averaging predictions from multiple independent bootstrap trees, smoothing random fluctuations

700

Why is logistic regression preferred over linear regression for classification tasks?

Because it constrains outputs to [0, 1] and models the log-odds of categorical outcomes, whereas linear regression can predict impossible probabilities.

700

When features are on different scales, KNN performance suffers. What is the fix and why?

Standardize or normalize features so one feature doesn’t dominate the distance metric.

700

Which clustering algorithm builds clusters in a “bottom-up” approach?

Agglomerative hierarchical clustering.

700

In bootstrap sampling, about what proportion of the original data appears in each resample on average?

About 63.2% of unique observations.

700

Why do random forests add randomness when choosing features at each split?

To de-correlate trees, making the averaged ensemble more robust.

800

What problem occurs when classes are perfectly separable?

Complete separation → coefficients diverge to ±∞ and the maximum-likelihood estimates become unstable.

800

How can KNN handle categorical predictors?

Use Hamming distance or encode categories numerically and use Gower’s distance for mixed-type data.

800

Why is standardization critical before performing K-means on variables like “Age” and “Income”?

Because features with larger ranges dominate Euclidean distance, distorting cluster boundaries.

800

A lift of 4 means what in business terms?

The model is 4× better than random at identifying positive cases in the targeted segment.

800

List two hyperparameters of Gradient Boosting that control bias–variance trade-off.

Number of trees (B) and learning rate (λ) — small λ reduces variance but needs larger B

900

What is the interpretation of the exponential of a logistic coefficient?

It is the odds ratio — the multiplicative change in odds for a one-unit increase in x.

900

Why does a low K reduce bias but increase variance?

Each prediction depends on fewer samples → more sensitivity to noise (variance↑) but closer fit to training data (bias↓).

900

How is the total variance in PCA partitioned among the principal components?

Each component captures a portion of total variance proportional to its eigenvalue; components are orthogonal.

900

What does a Gini coefficient of 0.5 mean compared to 1 or 0?

Moderate rank-ordering ability (0 = none, 1 = perfect).

900

What happens when the learning rate is too high in Gradient Boosting?

Overfitting and divergence — the model overshoots the minimum and fits noise.

1000

Which two regularization techniques are often used to stabilize logistic regression and what do they penalize?

L1 (Lasso) penalizes |β| to induce sparsity; L2 (Ridge) penalizes β² to reduce variance and multicollinearity.

1000

What metric should replace accuracy for highly imbalanced binary classification problems?

Precision, recall, F1-score, or AUC — accuracy can be misleading when one class dominates.

1000

Explain one limitation of PCA for classification tasks.

PCA maximizes variance, not class separation, so principal components may not align with discriminative features.

1000

Why can the χ² goodness-of-fit test become unreliable in large datasets?

Because tiny deviations become statistically significant even if practically negligible → p-values misleading.

1000

Conceptually, how do bagging and boosting differ in how they build models?

Bagging builds independent models in parallel to reduce variance; boosting builds sequential models that focus on previous errors to reduce bias.