Unsupervised learning
PCA/Missing data
Recommender Systems
Time series
Bayes bois

100

This parameter in K-means clustering decides how many centroids to randomly place over the dataset 

What is K?

100

Using this technique to fill in missing data artificially inflates the center of our histogram, and undervalues the variance of our dataset.

What is imputing the mean?

100

This type of system tries to match a consumer to consumers who like similar things to make recommendations.

What is user-based?

100

True or False, when forecasting further out from our data set the accuracy increases. 

What is False? The accuracy decreases as we move further away from our dataset.

100

Rather than calculating how likely an event is to happen, we calculate how likely an event is to happen given THIS

What is a PRIOR?

200

This preprocessing trick is required on all clustering models. 

What is Scaling?

200

The data of interest is not systematically different between respondents and nonrespondents.

What is Missing Completely At Random?

200

This distance algorithm is best described as following a grid pattern of short right angles. 

What is Manhatten/taxi/city ect.

200

This describes the term for when fluctuations occur over set intervals of time.

What is seasonality?

200

The result of our bayesian inference.

What is a Posterior?

300

This model is best used on odd-shaped clusters

What is DBscan

300

When using PCA we make these two assumptions about our data.

What is,

  • Large variance defines importance
  • Linear relationship
300

This vector has a cosign angle of at or near 90 degrees

What is orthogonal?

300

The amount of correlation between a variable and a lag of itself that is not explained by more recent correlations.

What is Partial Auto Correlation?

300

True or False. Giving drug A to all weekend appointments and drug B to all weekday appointments is a true experiment.

What is False?

400

This evaluation metric uses the sum of squared errors

What is inertia?

400

This is the difference between feature extraction vs feature elimination. 

  • Feature Elimination = Dropping features 
  • Feature Extraction = Combine existing features into new ones to reduce the number of features
400

A cosine similarity of .99 and a pairwise distance of .01 indicates this.

What is these two things are VERY similar?

400

This time series algorithm is useful when we want to model longer term data with sudden fluctuations

What is ARIMA? (or SARIMA or SARIMAX)

400

Our prior and posterior distributions are the same or can be used together easily.

What is Conjugacy?

500

In DBScan, this defines the distance from which to group neighboring data points. 

What is Epsilon?

500

The error in this process:

  1. Import PCA from sklearn.decomposition
  2. Scale features (if necessary)
  3. Instantiate PCA
  4. Fit & transform X and y
  5. Review explained variance

What is fit on y? (PCA is unsupervised) 

500

This describes the cold start problem.

What are early iterations of a user based recommender have too few ratings or users to recommend effectively?

500

DAILY DOUBLE! Wager up to 1k! 

Describe Benford's Law.



500

This word vectorizer finds similarities to words from a pre-trained model by populating sparse matrix.

What is Word2Vec?