Unsupervised learning
PCA/Missing data
Recommender Systems
Time series
Bayes bois

100

This parameter in K-means clustering decides how many centroids to randomly place over the dataset 

What is K?

100

Using this technique to fill in missing data artificially inflates the center of our histogram, and undervalues the variance of our dataset.

What is imputing the mean (or median)?

100

This type of recommender system tries to match a users to similar users.

What is user-based?

100

True or False, when forecasting further out from our dataset the accuracy increases. 

What is False? The accuracy generally decreases as we move further away from our dataset.

100

Rather than calculating how likely an event is to happen, we calculate how likely an event is to happen given THIS

What is a PRIOR?

200

This preprocessing trick is required on all clustering models. 

What is Scaling?

200

This distance algorithm is best described as following a grid pattern of short right angles. 

What is Manhatten/taxi/l1?

200

This describes the term for when fluctuations occur over set intervals of time.

What is seasonality?

200

The result of our bayesian inference.

What is a Posterior?

300

This algorithm works well on oddly-shaped clusters

What is DBSCAN?

300

When using PCA we make these two assumptions about our data.

What is,

  • Large variance defines importance
  • Linear relationship
300

This vector has a cosign angle of at or near 90 degrees

What is orthogonal?

300

The amount of correlation between a variable and a lag of itself that is not explained by more recent correlations.

What is Partial Auto Correlation?

300

True or False. Giving drug A to all weekend appointments and drug B to all weekday appointments is a true experiment.

What is False?

400

This evaluation metric uses the sum of squared errors

What is inertia?

400

This is the difference between feature extraction vs feature elimination. 

  • Feature Elimination = Dropping features 
  • Feature Extraction = Combine existing features into new ones to reduce the number of features
400

A cosine similarity of .99 and a pairwise distance of .01 indicates this.

What is these two things are VERY similar?

400

This algorithm uses lagged time series values in a regression equation.

What is ARIMA? (or SARIMA or SARIMAX or AR)

400

Our prior and posterior distributions are the same or can be used together easily.

What is Conjugacy?

500

In DBScan, this defines the distance from which to group neighboring data points. 

What is Epsilon?

500

The error in this process:

  1. Import PCA from sklearn.decomposition
  2. Scale features (if necessary)
  3. Instantiate PCA
  4. Fit & transform X and y
  5. Review explained variance

What is fit on y? (PCA is unsupervised) 

500

Early iterations of a user-based recommender have too few ratings to recommend effectively.

What is the cold start problem?

500

DAILY DOUBLE! Wager up to 1k! 

Describe Benford's Law.



500

This word vectorizer finds similarities to words from a pre-trained model by populating sparse matrix.

What is Word2Vec?