DSI Quiz 4 Review

Advanced Supervised Learning

Deep Learning

Clustering

Convolutional Neural Networks

More
Unsupervised Learning

100

What does bootstrapping mean?

bootstrapping is random resampling with replacement.

100

What is the loss function for multiclass classification problems?

Categorical Cross Entropy

100

In this movie, Cady Heron is a hit with The Plastics, the A-list girl clique at her new school, until she makes the mistake of falling for Aaron Samuels, the ex-boyfriend of alpha Plastic Regina George.

It is also a clustering technique that clusters all data points and requires you to specify the number of clusters.

KMean Girls

KMeans + Mean Girls

100

When we reshape an image to be, say, (28, 28, 1) or (28, 28, 3), what does that third number represent?

1: greyscale

3: RGB

We add this to take color into consideration.

100

What is the difference between supervised and unsupervised learning?

There's no target in unsupervised learning

200

What is the difference between AdaBoost and GradientBoost?

AdaBoost is about reweighting the preceding model's errors in subsequent iterations.
Gradient boosting is about fitting subsequent models to the residuals of the last model.

200

What Is the Difference Between Epoch and Batch in Deep Learning?

Epoch - Represents one iteration over the entire dataset (everything put into the training model).

Batch - Refers to when we do not pass the entire dataset into the neural network at once, we divide the dataset into several batches.

200

What are the two hyperparameters to tune in DBSCAN? Explain what they mean

epsilon: the “searching” distance when attempting to build a cluster. This is a euclidean distance.

min_samples: the minimum number of points needed to form a cluster. For example, if we set the min_samples parameter as 5, then we need at least 5 points to form a dense region.

200

This is how we scale image data when working with Convolutional Neural Nets.

We want to scale the data to be between 0-1, so we'll divide by 255. This is the maximum value for pixels.

200

What is the difference between feature selection and feature extraction?

Feature Selection
- We drop variables from our model.
Feature Extraction
- In feature extraction, we take our existing features and combine them together in a particular way. We can then drop some of these "new" variables, but the variables we keep are still a combination of the old variables!
- This allows us to still reduce the number of features in our model but we can keep all of the most important pieces of the original features!

300

What are 3 ways to reduce overfitting when working with ensemble models?

reduce max_depth

increase min_samples_split

increase min_samples_leaf

300

What does it mean to be a feedforward fully-connected neural network?

Feedforward - calculations are passed down the network in one direction

fully-connected - each node is connected to every node in the previous layer and the next layer.

300

What are the pros and cons of KMeans?

Pros: Fast, simple, easy to understand

Cons: Sensitive to outliers, clusters every data point, requires that you specify the number of clusters, influenced by centroid initialization

300

Describe MaxPooling in a CNN? Why do we want to do this?

In Max Pooling, we pass a filter over an image. At each step, we take the maximum value and record it as part of the output.

Why use max pooling?

Reduces the data dimensionality.
Protects against overfitting by creating a more abstract representation.

300

What is the critical difference between KMeans and KNN?

The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn’t—and is thus unsupervised learning.

400

How does a decision tree use Gini to decide which variable to split on?

At any node, considers the subset of our dataframe that exists at that node.
Iterates through each variable that could potentially split the data.
Calculates the Gini impurity for every possible split.
Select the variable that decreases Gini impurity the most from the parent node to the child node.

400

Why do we need to use 'to_categorical' on the target for multiclass classification problems?

We need our target to be in distinct categories (basically dummied out) so our output layer, softmax, can work properly. Recall that our output layer needs as many nodes as we have classes because softmax outputs the probability of our sample falling into each class.

400

What is 1 pro and 1 con of DBSCAN?

Pros:

DBSCAN allows us to detect some cluster patterns that k-Means might not be able to detect.
We don’t need to pre-specify the number of clusters; the algorithm will determine how many clusters are appropriate given fixed min_samples and epsilon values. This is particularly valuable when we are clustering data in more than two or three dimensions.
Not every point is clustered! Good for identifying outliers.

Cons:

DBSCAN requires us to tune two parameters.
DBSCAN works well when clusters are of a different density than the overall data, but does not work well when the clusters themselves are of varying density.
Fixed epsilon.

400

What is padding?

We can use padding to add a border of white cells around the edge of the image. This will allow pixels on the edge/in the corner to be included more frequently.

400

What is transfer learning?

Using the output of one model (either supervised or unsupervised) as an input for another model

500

What are the advantages of a random forest model?

- By adding additional randomization (using bootstrapped samples of data, & seeing only a subset of features at each split point), the trees in the forest are less correlated, which results in lower variance and a more robust model.

By "averaging" predictions from multiple models, we'll see that we can often cancel our errors out and get closer to the true values

500

Name and briefly describe 3 regularization techniques in deep learning

L1/L2 Regularization

Dropout

Earlystopping

500

What does the silhouette score tell us?

The average Silhouette Score is the average of each point's score.

cohesion = Average distance of points within clusters
separation = Average distance of points in one cluster to points in other clusters

500

What is the purpose of the convolutional layer in a convolutional neural network?

The convolution layer is where we pass a filter over an image and do some calculation at each step. Specifically, we take pixels that are close to one another, then summarize them with one number. The goal of the convolution layer is to identify important features in our images, like edges.

500

Names the pros and cons of dimensionality reduction.

Pros:

Less misleading data means model accuracy improves.
Fewer dimensions mean less computing. Less data means that algorithms train faster.
Less data means less storage space required.
Removes redundant features and noise.
Dimensionality Reduction helps us visualize the data

Cons:

Some information is lost, possibly degrading the performance of subsequent training algorithms.
It can be computationally intensive.
Transformed features are often hard to interpret.
It makes the independent variables less interpretable.