Ensemble Methods
10 Points: The process of training multiple models sequentially to correct the mistakes of previous models is called what?
20 Points: A common ensemble method that uses multiple Decision Trees is what?
30 Points: The primary goal of using ensemble methods is to reduce what error?
40 Points: This ensemble method is known as "bootstrap aggregation."
50 Points: Boosting models are particularly vulnerable to this type of data.
Decision Trees & Random Forests
10 Points: The technique to avoid overfitting a Decision Tree by removing branches is called what?
20 Points: The measure of uncertainty or impurity in a dataset is known as what?
30 Points: The Decision Tree method uses two measures to make decisions: entropy and what else?
40 Points: What is the term for a subset of a Decision Tree?
50 Points: Random Forest is an extension of what ensemble method?
Neural Networks
10 Points: The "deep" in deep learning refers to the number of what?
20 Points: A common type of neural network used for facial recognition and image processing is what?
30 Points: What is the term for the function that determines the output of a node?
40 Points: The difference between a single-layer and multilayer perceptron is the presence of what kind of layers?
50 Points: What is the term for the measure of how wrong a model's prediction is?
Model Evaluation Metrics
10 Points: What metric is the harmonic mean of precision and recall?
20 Points: What is the term for the ratio of correctly predicted positive observations to the total predicted positives?
30 Points: The area under an ROC curve is called what?
40 Points: What does an AUC of 1.0 indicate?
50 Points: The F1-score is a good metric to use when the classes are what?
Naive Bayes Classifier
10 Points: The "naive" part of the name comes from the assumption that features are what?
20 Points: What is a common application of Naive Bayes?
30 Points: Naive Bayes is considered what type of classifier?
40 Points: What is the name of the smoothing technique used to solve the "Zero Frequency" issue?
50 Points: A disadvantage of the Naive Bayes algorithm is its strong assumption of what?
K-Nearest Neighbor (KNN)
10 Points: The "K" in KNN represents the number of what?
20 Points: The most common distance measure used in KNN is what?
30 Points: KNN can be used for both classification and what else?
40 Points: What is a key disadvantage of KNN when dealing with large datasets?
50 Points: A disadvantage of Euclidean distance is that it doesn't work well with what kind of data?
Regression & Classification
10 Points: A model that predicts a numerical value is solving a what problem?
20 Points: A classifier that models the boundary between data is called what?
30 Points: A Regression Tree is a Decision Tree that is used for what kind of problems?
40 Points: Instead of entropy, what different measure is used to build a regression tree?
50 Points: This type of regression is used for classification problems.
Clustering Algorithms
10 Points: This clustering algorithm, unlike K-Means, does not require a predefined number of clusters.
20 Points: This metric is commonly used to determine the optimal number of clusters for K-Means.
30 Points: In K-Means clustering, what are the central points of each cluster called?
40 Points: This type of clustering algorithm builds a hierarchy of clusters.
50 Points: When dealing with clustering, the goal is to maximize the distance between clusters and minimize the distance between data points within what?
Support Vector Machines (SVM)
10 Points: A Support Vector Machine's goal is to find a line that separates the data points. What is this line called?
20 Points: The data points closest to the separating line that influence its position are called what?
30 Points: This is the distance between the hyperplane and the closest data point from either class.
40 Points: When data is not linearly separable, SVM uses a what to transform the data into a higher dimension?
50 Points: The primary goal of an SVM is to maximize the what?