This is the full name of the tree-based model known as CART
Classification and Regression Trees
What is the default base estimator in Adaboost and Gradient Boost?
DecisionTreeClassifier(max_depth=1).
In this 1996 action-thriller, terrorists hijack a plane and actors Kurt Russell, Steven Seagal, and Halle Berry must work together to stop them. With so much at stake, using Gini Impurity would have made choices easier.
Executive Decision Tree
(Executive Decision + Decision Tree)
What is the best kernel to start off using and why?
RBF because its the most adjustable and tends to perform the best
Give an example of a variation of gradient descent that dynamically updates the learning rate (alpha)
XGBoost
Explain what bootstrap aggregating means
Modeling on subsets of data (with replacements) and aggregating the results
What is the difference between Bagging and a RandomForest Classifier?
Bagging - the default is a decision tree, but you can bag other model types. Bagging Classifier fits the base classifiers each on random subsets of the original dataset and then aggregate their individual predictions
RandomForest - Only uses decision trees - uses bootstrap aggregating in combination with showing each model only a random subset of the features in the dataset
In this movie a mild-mannered chemist and an ex-con must lead the counterstrike when a rogue group of military men, led by a renegade general, threaten a nerve gas attack from Alcatraz against San Francisco. It is also visualization of the trade-off between sensitivity and specificity.
The ROCk-Curve
(The Rock + ROC Curve)
What is the difference between maximal margin and soft margin?
the ‘best’ separating hyperplane is the one that creates the maximal margin. That is, it has the widest space around it with no points inside.
With a soft margin - allows the SVM to make a certain number of mistakes and keep margin as wide as possible so that other points can still be classified correctly.
Name 3 hyperparameters you would want to tune with a RandomForest Classifier
N_estimators: The number of trees in the forest
Max_depth: The maximum depth of the tree.
Min_samples_split: The minimum number of samples required to split an internal node
Min_samples_leaf: The minimum number of samples required to be at a leaf node.
Max_features: The number of features to consider when looking for the best split
What do we call it when we reduce the number of splits a decision tree makes?
Pruning
In terms of the bias/variance tradeoff - what is the difference between bagging and boosting?
Bagging aims to reduce variance
Boosting aims to reduce bias (and can reduce variance a bit as well)!
In this movie, the presidencies of Kennedy and Johnson, the events of Vietnam, Watergate and other historical events unfold through the perspective of an Alabama man (played by Tom Hanks) with an IQ of 75, whose only desire is to be reunited with his childhood sweetheart.
It is also a CART model that learned on bootstrapped data, and at each split in the learning process, a random subset of the features.
Random Forrest Gump
(Forrest Gump + Random Forest)
What is the kernel trick and why would we want to use it?
Is applying a function to our data to increase dimensionality - that is, bend the axes that our model is fit on. We would want to use this to create linear separability between data that cannot be separated by a straight line.
Describe the difference between an estimator and a transformer and give an example of each
An estimator is a model - this is something that can generate predictions. (pick any model we've learned so far)
A transformer is something that transforms our data (Standard Scaler, CountVectorizer, TfidfVectorizer, Polynomial Features, etc)
Explain, in basic terms, what AdaBoost does
The core principle of AdaBoost is to fit a sequence of weak learners on repeatedly modified versions of the data.
After each fit, the importance weights on each observation need to be updated.
The predictions are then combined through a weighted majority vote to produce the final prediction.
Name 2 pros and 2 cons of boosting
1. Achieves higher performance than bagging when the hyperparameters are properly tuned
2. Works equally well for classification and regression
3. Can use robust loss functions that make the model resistance to outliers
4. Difficult and time consuming to tune hyperparameters
5.Cannot be parallelized
6. Higher risk of overfitting compared to bagging
In this movie, a World War II American Army Medic who served during the Battle of Okinawa, refuses to kill people, and becomes the first man in American history to receive the Medal of Honor without firing a shot. It is also a regularization type...
Hacksaw Ridge Regularization
(Hacksaw Ridge + Ridge Regularization)
What is C and what effect does it have in SVMs?
C is how much we regularize the boundary that is fit between classes.
If C is small: We regularize substantially, leading to a less perfect classification of our training data.
If C is large: We do not regularize much, leading to a more perfect classification of our training data.
Name one pro and one con of using decision trees:
Not influenced by scale
They don’t care about the distribution of the data
Reasonably interpretable
Don’t require special encoding
High variance models - tend to overfit
Explain what Gradient Boost does
Fits a weak learner (by default a decision tree with max depth of 1), finds the residuals, fits a second model to the residuals (or misclassifications). Aggregate the first and second model together. Gradient boosting is about fitting subsequent model to the residuals/misclassifications of the last model.
Briefly describe the difference between an ExtraTrees Classifier and a RandomForest Classifier
An ExtraTrees classifier, by default in Sklearn does not bootstrap the data. Instead, each model sees the same data, however the a random subset of features is chosen for each tree and the split is randomized - it is not greedy (does not necessarily choose the best split for reducing gini impurity).
This is the name of a movie in which young F.B.I. cadet must receive the help of an incarcerated and manipulative cannibal killer to help catch another serial killer, a madman who skins his victims. It is also the term for an anonymous function.
Silence of the Lambda Functions
(Silence of the Lambs + Lambda Functions)
What do we need to do to our data before using an SVM?
SCALE
What is the hyperparameter ccp_alpha?
A complexity parameter similar to in regularization. As ccp_alpha increases, we regularize more.