DSI 822 - v1

Acronyms

Advanced Supervised Learning

Clustering

Unsupervised Learning

Dealer's Choice

100

This is the full name of the basic linear regression known as OLS

Ordinary Least Squares

100

What does bootstrapping mean?

Random sampling with replacement

100

In this movie, Cady Heron is a hit with The Plastics, the A-list girl clique at her new school, until she makes the mistake of falling for Aaron Samuels, the ex-boyfriend of alpha Plastic Regina George.

It is also a clustering technique that clusters all data points and requires you to specify the number of clusters.

KMean Girls

KMeans + Mean Girls

100

What is the difference between supervised and unsupervised learning?

There's no target in unsupervised learning

100

In a K Nearest Neighbors classifier, this distance metric is also the shortest distance between two points in vector space

Euclidean

200

This is the full name of the tree-based model known as CART

Classification and Regression Tree

200

What is the difference between AdaBoost and GradientBoost?

AdaBoost is about reweighting the preceding model's errors in subsequent iterations.
Gradient boosting is about fitting subsequent models to the residuals of the last model.

200

What are the two hyperparameters to tune in DBSCAN? Explain what they mean

epsilon: the “searching” distance when attempting to build a cluster. This is a euclidean distance.

min_samples: the minimum number of points needed to form a cluster. For example, if we set the min_samples parameter as 5, then we need at least 5 points to form a dense region.

200

What is the difference between feature selection and feature extraction?

Feature Selection
- We drop variables from our model.
Feature Extraction
- In feature extraction, we take our existing features and combine them together in a particular way. We can then drop some of these "new" variables, but the variables we keep are still a combination of the old variables!
- This allows us to still reduce the number of features in our model but we can keep all of the most important pieces of the original features!

200

This is the link function that bends a Linear Regression into a Logistic Regression

Logit function

300

This is the full name of the graphical plot known as ROC curve

Receiver Operating Characteristic

300

What are 3 ways to reduce overfitting when working with ensemble models?

reduce max_depth

increase min_samples_split

increase min_samples_leaf

300

What are the pros and cons of KMeans?

Pros: Fast, simple, easy to understand

Cons: Sensitive to outliers, clusters every data point, requires that you specify the number of clusters, influenced by centroid initialization

300

What is the critical difference between KMeans and KNN?

The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn’t—and is thus unsupervised learning.

300

This model features a combination of both ridge and LASSO regularization models

Elastic Net

400

This is the full name for the transformer that tells us the relative importance of a word (TFIDF)

Term frequency-inverse document frequency

400

How does a decision tree use Gini to decide which variable to split on?

At any node, considers the subset of our dataframe that exists at that node.
Iterates through each variable that could potentially split the data.
Calculates the Gini impurity for every possible split.
Select the variable that decreases Gini impurity the most from the parent node to the child node.

400

What is 1 pro and 1 con of DBSCAN?

Pros:

DBSCAN allows us to detect some cluster patterns that k-Means might not be able to detect.
We don’t need to pre-specify the number of clusters; the algorithm will determine how many clusters are appropriate given fixed min_samples and epsilon values. This is particularly valuable when we are clustering data in more than two or three dimensions.
Not every point is clustered! Good for identifying outliers.

Cons:

DBSCAN requires us to tune two parameters.
DBSCAN works well when clusters are of a different density than the overall data, but does not work well when the clusters themselves are of varying density.
Fixed epsilon.

400

In PCA, this is the term that refers to breaking down a covariance matrix into Eigenvalues and Eigenvectors

Spectral decomposition or eigendecomposition

400

This is the term describing when a set of random variables have constant variance

Homoskedasticity

500

This is the full name of the clustering method known as DBSCAN

Density-Based Spatial Clustering of Applications with Noise

500

What are the advantages of a random forest model?

By adding additional randomization, the trees in the forest are less correlated, which results in lower variance and a more robust model.

By "averaging" predictions from multiple models, we'll see that we can often cancel our errors out and get closer to the true values

500

What does the silhouette score tell us?

The average Silhouette Score is the average of each point's score.

cohesion = Average distance of points within clusters
separation = Average distance of points in one cluster to points in other clusters

500

Names the pros and cons of dimensionality reduction.

Pros:

Less misleading data means model accuracy improves.
Fewer dimensions mean less computing. Less data means that algorithms train faster.
Less data means less storage space required.
Removes redundant features and noise.
Dimensionality Reduction helps us visualize the data

Cons:

Some information is lost, possibly degrading the performance of subsequent training algorithms.
It can be computationally intensive.
Transformed features are often hard to interpret.
It makes the independent variables less interpretable.

500

What are the three types of missingness? Give examples

MCAR: no pattern to missingness

MAR: conditional on other observed values

NMAR: systematic difference between observed and missing data