DSI Week 6 Jeopardy Jeopardy Template

CART I

CART II

GLMs

SVMs

Potpourri

100

This is the full name of the tree-based model known as CART

Classification and Regression Trees

100

What is the default base estimator in Adaboost and Gradient Boost?

DecisionTreeClassifier(max_depth=1).

100

What do you always have to add to your X when using stats models?

Set a constant!

100

What is the best kernel to start off using and why?

RBF because its the most adjustable and tends to perform the best

100

Assume you’ve built a poisson regression model to predict the number of loaves of sourdough bread that have been baked this month. One of the variables in your model is the number of bread machines purchased from Amazon. Assume you exponentiate your coefficient for that variable and get 3.529. How do you interpret that coefficient?

All else held equal, for a one-unit increase in bread machines I expect to get 3.529 times as many loaves of bread.

200

Explain what bootstrap aggregating means

Modeling on subsets of data (with replacements) and aggregating the results

200

Name 2 pros and 2 cons of boosting

1. Achieves higher performance than bagging when the hyperparameters are properly tuned

2. Works equally well for classification and regression

3. Can use robust loss functions that make the model resistance to outliers

4. Difficult and time consuming to tune hyperparameters

5.Cannot be parallelized

6. Higher risk of overfitting compared to bagging

200

What are the three components of GLM Anatomy?

1. systematic component - the linear component

2. random component - the distribution assumption of our target

3. and a link function - so we don’t predict anything that’s impossible

200

What is the difference between maximal margin and soft margin?

the ‘best’ separating hyperplane is the one that creates the maximal margin. That is, it has the widest space around it with no points inside.
With a soft margin - allows the SVM to make a certain number of mistakes and keep margin as wide as possible so that other points can still be classified correctly.

200

Name 3 hyperparameters you would want to tune with a RandomForest Classifier

N_estimators: The number of trees in the forest

Max_depth: The maximum depth of the tree.

Min_samples_split: The minimum number of samples required to split an internal node

Min_samples_leaf: The minimum number of samples required to be at a leaf node.

Max_features: The number of features to consider when looking for the best split

300

In this movie, the presidencies of Kennedy and Johnson, the events of Vietnam, Watergate and other historical events unfold through the perspective of an Alabama man (played by Tom Hanks) with an IQ of 75, whose only desire is to be reunited with his childhood sweetheart.

It is also a CART model that learned on bootstrapped data, and at each split in the learning process, a random subset of the features.

Random Forrest Gump

300

In terms of the bias/variance tradeoff - what is the difference between bagging and boosting?

Bagging aims to reduce variance

Boosting aims to reduce bias (and can reduce variance a bit as well)!

300

Give an example of the kind of question a Poisson regression could answer/predict?

How many people are going to be in the next DSI cohort?
How many loaves of sourdough bread have been baked in the last 4 months?
How many cat videos are there on the internet?

300

What is the kernel trick and why would we want to use it?

Is applying a function to our data to increase dimensionality - that is, bend the axes that our model is fit on. We would want to use this to create linear separability between data that cannot be separated by a straight line.

300

What is the difference between Bagging and a RandomForest Classifier?

Bagging - the default is a decision tree, but you can bag other model types. Bagging Classifier fits the base classifiers each on random subsets of the original dataset and then aggregate their individual predictions
RandomForest - Only uses decision trees - uses bootstrap aggregating in combination with showing each model only a random subset of the features in the dataset

400

Explain, in basic terms, what AdaBoost does

The core principle of AdaBoost is to fit a sequence of weak learners on repeatedly modified versions of the data.

After each fit, the importance weights on each observation need to be updated.

The predictions are then combined through a weighted majority vote to produce the final prediction.

400

In this 1996 action-thriller, terrorists hijack a plane and actors Kurt Russell, Steven Seagal, and Halle Berry must work together to stop them. With so much at stake, using Gini Impurity would have made choices easier.

Executive Decision Tree

(Executive Decision + Decision Tree)

400

Give an example of the kind of data science question a gamma regression could address/predict?

How long until it rains?
How long until I get a puppy

400

What is C and what effect does it have in SVMs?

C is how much we regularize the boundary that is fit between classes.

If C is small: We regularize substantially, leading to a less perfect classification of our training data.

If C is large: We do not regularize much, leading to a more perfect classification of our training data.

400

Name one pro and one con of using decision trees:

Not influenced by scale
They don’t care about the distribution of the data
Reasonably interpretable
Don’t require special encoding
High variance models - tend to overfit

500

Explain what Gradient Boost does

Fits a weak learner (by default a decision tree with max depth of 1), finds the residuals, fits a second model to the residuals. Aggregate the first and second model together. Gradient boosting is about fitting subsequent model to the residuals of the last model.

500

Briefly describe the difference between an ExtraTrees Classifier and a RandomForest Classifier

An ExtraTrees classifier, by default in Sklearn does not bootstrap the data. Instead, each model sees the same data, however the a random subset of features is chosen for each tree and the split is randomized - it is not greedy (does not necessarily choose the best split for reducing gini impurity).

500

What is the overall point of using a GLM instead of say, a regular old linear regression?

It takes into consideration the distribution of the data based on the type of question that’s being asked - so it’s based on the distribution of your target. Because of this it will not create predictions that are not possible.

500

What do we need to do to our data before using an SVM?

SCALE

500

What is the hyperparameter ccp_alpha?

A complexity parameter similar to $\alpha$ in regularization. As ccp_alpha increases, we regularize more.

By default, this value is 0.