This classification method estimates the probability of an observation belonging to a specific category
What is logistic regression?
K-nearest neighbors estimates the label of a point based on what?
What are the labels of nearby points?
PCA is used for what purpose?
What is dimensionality reduction?
Cross-validation is used to estimate what?
What is how well a model performs on unseen data?
Decision trees use this type of splitting method.
What is recursive binary splitting?
Logistic regression maximizes this to determine its coefficients.
What is likelihood?
If K is too small, the model tends to do what?
What is underfit (jagged boundaries)?
In PCA, the new components are ___ with one another.
What is decorrelated (orthogonal)?
How many folds is typically a good balance between bias and computation in cross-validation?
What is 5 folds?
Bagging combines many trees trained on bootstrap samples to do what?
What is reduce variance and improve stability?
Logistic regression predicts this type of variable
What is a categorical (binary) variable?
The misclassification rate is calculated as what?
What is (# Incorrectly Classified) / (Total Records)?
K-means clustering minimizes this measure.
What is within-cluster variation?
Bootstrapping involves sampling with or without replacement?
What is with replacement?
Random Forest differs from bagging by doing what at each split?
What is selecting a random subset of predictors?
The output of logistic regression is converted using this mathematical function to keep values between 0 and 1.
What is the sigmoid (logistic) function?
In a confusion matrix, the false positive rate is computed as?
What is FP / (FP + TN) or # True negatives classified as positive / total actual negatives?
Hierarchical clustering can be visualized using this type of chart.
What is a dendrogram?
Lift compares a model’s performance to random chance as a ratio of what?
What is (Response rate with model) / (Response rate at random)?
For classification in a Random Forest, how many predictors are typically considered at each split?
What is √p (square root of total predictors)?
Given coefficients Intercept = -8.7421 and Balance = 0.0042, which customer (A with $4000 or B with $1200) has a higher approval probability?
Who is Customer A? (≈ 0.9997 vs. 0.0243)
Given 40 true positives, 10 false negatives, 15 false positives, and 35 true negatives, what is the misclassification rate?
What is 0.25 (25%)?
Why is standardization important before clustering?
What is to prevent bias from features with larger numerical ranges dominating the distance calculation?
If a model captures 8 of 20 buyers in the top 10% of customers, what is the lift?
What is 4?
Gradient Boosting differs from bagging because it focuses on reducing what?
What is bias (and variance) by sequentially improving the model?