Ace of Database
Probably Probability
Super-Dupervised Learning
When Jon goes on Vaycay
NLP, yeah you know me!
100
The kind of databases banks and e-commerce companies typically use
What is SQL?
100
What the probability distribution is specified by for a discrete/continuous random variable.
What is PMF or Probability Mass Function or PDF Probability Density Function
100
What you optimize in every supervised learning algorithm.
What is minimize the cost/error function?
100
A technique to reduce dimensionality.
What is PCA [Principal component analysis], or SVD Single Value Decomposition.
100
The python library used to scrape websites.
What is Beautiful Soup?
200
Two examples of continuous data and two examples of categorical data.
What is CONTINUOUS data: Price, Weight CATEGORICAL: Race, Age Bracket
200
A distribution that might model how many pieces of mail a person should expect on a given day (given that they typically receive 4).
What is a Poisson distribution.
200
Random Forests are very cheap to _____, but very expensive to _____.
What is Predict and Train?
200
A technique to pick the ideal number of clusters for K-means.
What is the elbow method?
200
** CA CHOO ZZZ BAZOOOOO CA CHOOO *** !!!!!!!!!!!!!!!!! DOUBLE JEOPARDY !!!!!!!!!!!!!!!!
The number of rows and columns in your bag of words matrix if you have 500 documents, with a total vocabulary of 10,000 words.
300
The type of database of MongoDB.
What is document-based, or NoSQL, or Flat-File.
300
The probability that a hypothesis test correctly rejects the null hypothesis when the null hypothesis is false.
What is the statistical power?
300
The biggest problem in using Decision Trees to fit your data.
What is overfitting your data?
300
A big advantage that hierarchical clustering has over K-Means clustering.
What is you don't have to pick 'k'?
300
The long form of 'TF-IDF' and what it's used for.
What is Term Frequency–Inverse Document Frequency is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.
400
*** CACHOO CACHOO CHOO CHOO *** DOUBLE JEOPARDY QUESTION place your wager
A system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Thursday, 1 January 1970.
400
If you have a fair coin and a two-headed coin, pick one randomly and flip heads with it... the probability you picked the two-headed coin.
What is 1/3?
400
The type of regularization you would use when you have many regressors but are unsure about which are the important ones.
What is L1 or Ridge regression?
400
Compression, determining rank of a matrix, and matrix apprixmation are applications of this technique.
What is Singular Value Decomposition or SVD?
400
What Kmeans, NMF, PCA, LDA are used for on bag of word documents.
What is topic modeling?
500
The color of Jon's underpants.
What is dark navy?
500
The mathematical formulation for the expected value of the bias of a coin p given the coin has an initially unknown bias and has been flipped 3 times with the result {H, H, T}.
What is Integral( p * ( (3c2)*(p^2)*(1-p) / Integral( (3c2)(p^2)(1-p) dp )[0, 1] ) dp)[0, 1]
500
The trade-offs/differences between Item vs. User vs. Content based recommenders.
The cold-start problem (content based), scalability (item based if users >> products or sparse data), simplicity (user based simpler than item based and just as accurate for small/dense dataset).
500
The process of grouping together the different inflected forms of a word so they can be analyzed as a single item. For example, 'walked', 'walks', 'walking', will all be grouped together as 'walk'.
What is Lemmatization.