What does bootstrapping mean?
bootstrapping is random resampling with replacement.
What is the loss function for multiclass classification problems?
Categorical Cross Entropy
In this movie, Cady Heron is a hit with The Plastics, the A-list girl clique at her new school, until she makes the mistake of falling for Aaron Samuels, the ex-boyfriend of alpha Plastic Regina George.
It is also a clustering technique that clusters all data points and requires you to specify the number of clusters.
KMean Girls
KMeans + Mean Girls
When we reshape an image to be, say, (28, 28, 1) or (28, 28, 3), what does that third number represent?
1: greyscale
3: RGB
We add this to take color into consideration.
What is the difference between supervised and unsupervised learning?
There's no target in unsupervised learning
What is the difference between AdaBoost and GradientBoost?
What Is the Difference Between Epoch and Batch in Deep Learning?
Epoch - Represents one iteration over the entire dataset (everything put into the training model).
Batch - Refers to when we do not pass the entire dataset into the neural network at once, we divide the dataset into several batches.
What are the two hyperparameters to tune in DBSCAN? Explain what they mean
epsilon: the “searching” distance when attempting to build a cluster. This is a euclidean distance.
min_samples: the minimum number of points needed to form a cluster. For example, if we set the min_samples parameter as 5, then we need at least 5 points to form a dense region.
This is how we scale image data when working with Convolutional Neural Nets.
We want to scale the data to be between 0-1, so we'll divide by 255. This is the maximum value for pixels.
What is the difference between feature selection and feature extraction?
What are 3 ways to reduce overfitting when working with ensemble models?
reduce max_depth
increase min_samples_split
increase min_samples_leaf
What does it mean to be a feedforward fully-connected neural network?
Feedforward - calculations are passed down the network in one direction
fully-connected - each node is connected to every node in the previous layer and the next layer.
What are the pros and cons of KMeans?
Pros: Fast, simple, easy to understand
Cons: Sensitive to outliers, clusters every data point, requires that you specify the number of clusters, influenced by centroid initialization
Describe MaxPooling in a CNN? Why do we want to do this?
In Max Pooling, we pass a filter over an image. At each step, we take the maximum value and record it as part of the output.
Why use max pooling?
What is the critical difference between KMeans and KNN?
The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn’t—and is thus unsupervised learning.
How does a decision tree use Gini to decide which variable to split on?
Why do we need to use 'to_categorical' on the target for multiclass classification problems?
We need our target to be in distinct categories (basically dummied out) so our output layer, softmax, can work properly. Recall that our output layer needs as many nodes as we have classes because softmax outputs the probability of our sample falling into each class.
What is 1 pro and 1 con of DBSCAN?
Pros:
Cons:
What is padding?
We can use padding to add a border of white cells around the edge of the image. This will allow pixels on the edge/in the corner to be included more frequently.
What is transfer learning?
Using the output of one model (either supervised or unsupervised) as an input for another model
What are the advantages of a random forest model?
- By adding additional randomization (using bootstrapped samples of data, & seeing only a subset of features at each split point), the trees in the forest are less correlated, which results in lower variance and a more robust model.
By "averaging" predictions from multiple models, we'll see that we can often cancel our errors out and get closer to the true values
Name and briefly describe 3 regularization techniques in deep learning
L1/L2 Regularization
Dropout
Earlystopping
What does the silhouette score tell us?
The average Silhouette Score is the average of each point's score.
What is the purpose of the convolutional layer in a convolutional neural network?
The convolution layer is where we pass a filter over an image and do some calculation at each step. Specifically, we take pixels that are close to one another, then summarize them with one number. The goal of the convolution layer is to identify important features in our images, like edges.
Names the pros and cons of dimensionality reduction.
Pros:
Cons: