Acronyms
Names in Data Science
Data Science Movies
Potent Parameters
SQL
100

GLM

Generalized Linear Model

100

This test checks for stationarity  

Augmented Dickey-Fuller Hypothesis Test

100

By making decisions from a subset of features from a bootstrapped dataframe, one will discover that life really is like a box of chocolates.

Random Forest Gump

(Random Forest + Forest Gump)

100

This hyperparameter sets the regularization strength for the SVM model.

C (inverse of alpha)

100

This is the term for a SQL query within another SQL query

Subquery

200

MMC

Maximum Margin Classifier

200

This famous Swiss mathematician developed a well known distribution to describe large numbers of trials (each individual trial having some probability of being a failure or success).

Jacob (or Jacques) Bernoulli

200

Whoopi Goldberg is placed in a witness protection program as a nun. Like a hidden layer, she's forced to change her perspective into a new range of outputs.

Sister Activation Function

(Sister Act + Activation Function)

200

This is the total number of hyperparameters used in a Simple Linear Regression or OLS model.

0

200

This handy SQL function allows us to fill in NULLs

COALESCE

300

BIC

Bayesian Information Criterion

300

This Japanese statistician worked on information theory and developed a way to use parsimony to describe the relative quality of a model given a set of data. 

Hirotugu Akaike (Akaike Information Criterion, or AIC)

300

As a part in this secret agency, Will Smith must embrace the unknown. There is only one term for this type of model that is devoid of any interpretability.

Men in Black Box Model

(Men in Black + Black Box Model)

300

A Logistic Regression is made from 3 components: A Linear Component, a Random error component, and this component, which bends the linear input into a range between 0 and 1

Logit Link Function

300

This is the result from the following SQL query:

SELECT type, AVG(attack), AVG(defense)

FROM pokemon

GROUP BY type

ORDER BY AVG(attack) DESC

HAVING AVG(defense) > 50

Syntax Error (ORDER BY comes after HAVING)

400

DBSCAN

Density-Based Spatial Clustering of Applications with Noise

400

In 1908, this Englishman working at the Guinness brewery in Dublin published the t-test and t distribution under the name "Student"

William Sealy Gosset

400

This book by Ernest Cline was turned into a movie by Steven Spielburg and features a dystopia where people find salvation in a game called "The OASIS". The game may be "dummifying", but the players would prefer this other term.

Ready Player One-Hot Encoding

(Ready Player One + One-Hot Encoding)

400

In the SARIMA model, these are the 7 main hyperparameters to set.

(p,d,q) for the order & (P,D,Q,S) for the seasonal order

400

This is the meaning of ETL, a common paradigm in data storage

Extract Transform Load
500

MLE

Maximum Likelihood Estimation

500

Although Simon Newcomb discovered the phenomenon in 1881, this physicist made the concept more popular in 1938 by publishing a paper titled "The Law of Anomalous Numbers"

Benford's Law (Frank Benford)

500

This classic Hitchcock horror/thriller features a young man who runs a deadly motel with his "mother". The audience may think the variables involved are independent from each other, but this couldn't be further from the truth.

Psycho-llinearity

(Psycho + collinearity)

500

In the KNeighborsClassifier from sklearn, this is the name of the hyperparameter for stupulating how distance is calculated

metric

500

This is the kind of join that returns records that have matching values in both tables 

Inner Join