The classification metric "True Negative Rate" is also known as this
What is Specificity?
If a model scores very well on it's training set, but much more poorly on the testing set, the model is likely suffering from this
What is overfitting?
This character must be at the end of the first line when defining a function
What is a colon (:)?
sklearn's train_test_split returns four objects, in exactly this order
What are X_train, X_test, y_train, y_test?
At the 5% significance level, a null hypothesis is rejected in favor of the alternate hypothesis when this value is below 0.05
What is a p-value?
In an sklearn LogisticRegression classifier model, this is the default scoring metric of the object
What is accuracy?
This Pandas method will return a boolean DataFrame or Series indicating whether or not a value is np.nan
What is isnull()?
What is isna()?
In an sklearn train_test_split, the proportional size of the testing set can be controlled by assigning this argument
What is test_size?
This hyperparameter controls the strength of regularization in Ridge Regressions.
What is C?
This is the term for describing two correlated features
What is Multicollinearity?
False positives are known as this type of error
What is Type I error?
In order to correctly create an alias that includes spaces (e.g. dept name), this/these character(s) are required in SQL syntax
What are double quotes ("")?
Use this to create an anonymous function
What is a lambda function?
This SQL clause is executed first
What is the FROM clause?
This special hypothesis test is used to determine whether a time series is stationary
What is an Augmented Dickey-Fuller Test?
This is the full name of the curve that shows the trade-off between the true positive rate and false positive rate.
What is Area Under the Curve - Receiver Operating Characteristic
`.map()` is a method available with this data type
What is a pandas.Series?
This non-parametric model incorporates random sampling with replacement of observations and also uses all features.
What are bagged trees?
This is the term describing when a set of random variables have constant variance
What is homoskedasticity?
In a K Nearest Neighbors classifier, this distance metric is also the shortest distance between two points in vector space.
What is Euclidean distance?
This is the loss function that neural networks typically use to handle multiclassification problems.
What is Categorical Crossentropy?
This method will transform a sparse matrix to a dense matrix.
What is `to_dense()`?
Arguments in a function that have default values are known as this.
What are kwargs?
The "L1" penalty is more commonly known as a regression with this form of regularization.
What is LASSO regression?
This is the link function that bends a Linear Regression into a Logistic Regression
What is the logit function?