DSI Week 6 Jeopardy

CART I

CART II

Machine Learning Movies

SVMs

Potpourri

100

This is the full name of the tree-based model known as CART

Classification and Regression Trees

100

What is the default base estimator in Adaboost and Gradient Boost?

DecisionTreeClassifier(max_depth=1).

100

In this 1996 action-thriller, terrorists hijack a plane and actors Kurt Russell, Steven Seagal, and Halle Berry must work together to stop them. With so much at stake, using Gini Impurity would have made choices easier.

Executive Decision Tree

(Executive Decision + Decision Tree)

100

What is the best kernel to start off using and why?

RBF because its the most adjustable and tends to perform the best

100

Give an example of a variation of gradient descent that dynamically updates the learning rate (alpha)

XGBoost

200

Explain what bootstrap aggregating means

Modeling on subsets of data (with replacements) and aggregating the results

200

What is the difference between Bagging and a RandomForest Classifier?

Bagging - the default is a decision tree, but you can bag other model types. Bagging Classifier fits the base classifiers each on random subsets of the original dataset and then aggregate their individual predictions
RandomForest - Only uses decision trees - uses bootstrap aggregating in combination with showing each model only a random subset of the features in the dataset

200

In this movie a mild-mannered chemist and an ex-con must lead the counterstrike when a rogue group of military men, led by a renegade general, threaten a nerve gas attack from Alcatraz against San Francisco. It is also visualization of the trade-off between sensitivity and specificity.

The ROCk-Curve

(The Rock + ROC Curve)

200

What is the difference between maximal margin and soft margin?

the ‘best’ separating hyperplane is the one that creates the maximal margin. That is, it has the widest space around it with no points inside.
With a soft margin - allows the SVM to make a certain number of mistakes and keep margin as wide as possible so that other points can still be classified correctly.

200

Name 3 hyperparameters you would want to tune with a RandomForest Classifier

N_estimators: The number of trees in the forest

Max_depth: The maximum depth of the tree.

Min_samples_split: The minimum number of samples required to split an internal node

Min_samples_leaf: The minimum number of samples required to be at a leaf node.

Max_features: The number of features to consider when looking for the best split

300

What do we call it when we reduce the number of splits a decision tree makes?

Pruning

300

In terms of the bias/variance tradeoff - what is the difference between bagging and boosting?

Bagging aims to reduce variance

Boosting aims to reduce bias (and can reduce variance a bit as well)!

300

In this movie, the presidencies of Kennedy and Johnson, the events of Vietnam, Watergate and other historical events unfold through the perspective of an Alabama man (played by Tom Hanks) with an IQ of 75, whose only desire is to be reunited with his childhood sweetheart.

It is also a CART model that learned on bootstrapped data, and at each split in the learning process, a random subset of the features.

Random Forrest Gump

(Forrest Gump + Random Forest)

300

What is the kernel trick and why would we want to use it?

Is applying a function to our data to increase dimensionality - that is, bend the axes that our model is fit on. We would want to use this to create linear separability between data that cannot be separated by a straight line.

300

Describe the difference between an estimator and a transformer and give an example of each

An estimator is a model - this is something that can generate predictions. (pick any model we've learned so far)

A transformer is something that transforms our data (Standard Scaler, CountVectorizer, TfidfVectorizer, Polynomial Features, etc)

400

Explain, in basic terms, what AdaBoost does

The core principle of AdaBoost is to fit a sequence of weak learners on repeatedly modified versions of the data.

After each fit, the importance weights on each observation need to be updated.

The predictions are then combined through a weighted majority vote to produce the final prediction.

400

Name 2 pros and 2 cons of boosting

1. Achieves higher performance than bagging when the hyperparameters are properly tuned

2. Works equally well for classification and regression

3. Can use robust loss functions that make the model resistance to outliers

4. Difficult and time consuming to tune hyperparameters

5.Cannot be parallelized

6. Higher risk of overfitting compared to bagging

400

In this movie, a World War II American Army Medic who served during the Battle of Okinawa, refuses to kill people, and becomes the first man in American history to receive the Medal of Honor without firing a shot. It is also a regularization type...

Hacksaw Ridge Regularization

(Hacksaw Ridge + Ridge Regularization)

400

What is C and what effect does it have in SVMs?

C is how much we regularize the boundary that is fit between classes.

If C is small: We regularize substantially, leading to a less perfect classification of our training data.

If C is large: We do not regularize much, leading to a more perfect classification of our training data.

400

Name one pro and one con of using decision trees:

Not influenced by scale
They don’t care about the distribution of the data
Reasonably interpretable
Don’t require special encoding
High variance models - tend to overfit

500

Explain what Gradient Boost does

Fits a weak learner (by default a decision tree with max depth of 1), finds the residuals, fits a second model to the residuals (or misclassifications). Aggregate the first and second model together. Gradient boosting is about fitting subsequent model to the residuals/misclassifications of the last model.

500

Briefly describe the difference between an ExtraTrees Classifier and a RandomForest Classifier

An ExtraTrees classifier, by default in Sklearn does not bootstrap the data. Instead, each model sees the same data, however the a random subset of features is chosen for each tree and the split is randomized - it is not greedy (does not necessarily choose the best split for reducing gini impurity).

500

This is the name of a movie in which young F.B.I. cadet must receive the help of an incarcerated and manipulative cannibal killer to help catch another serial killer, a madman who skins his victims. It is also the term for an anonymous function.

Silence of the Lambda Functions

(Silence of the Lambs + Lambda Functions)

500

What do we need to do to our data before using an SVM?

SCALE

500

What is the hyperparameter ccp_alpha?

A complexity parameter similar to $\alpha$ in regularization. As ccp_alpha increases, we regularize more.

By default, this value is 0.