These models, like CARTs, don't make any assumptions on the distribution of the data. In addition, features are considered independently.
What are non-parametric models?
This is the main philosophy behind ensemble methods.
What is using multiple models to make a better averaged model? (also accepted: Wisdom of the Crowds)
Math formula that "forces" the data into a higher dimension and helps create/find the best linear boundary between classes.
Kernel (Kernel Trick is accepted as well)
This term describes the value a model finds when it converges but it's not the optimum result.
Used to pick column from a table
What is SELECT
These are the criteria that classification/regression decision trees use, respectively.
What are gini index and MSE?
This is building several parallel models on random bootstrapped samples, then averaging the predictive power of each model.
What is Bagging?
This regularization parameter controls the "leniency" for misclassification. Increasing it leads to less lenient boundaries while decreasing it leads to more lenient lenient ones.
What is "C"?
The reason why an algorithm does not converge
What is 𝛼 too big?
Allows you to select columns by a condition
What is WHERE?
This is the gini index value when the classes are perfectly balanced in a node, while this is the gini index when there is only one class represented in a node, respectively
What are 0.5 and 0, respectively?
True or false - When fitting a random forest classifier the sample of observations is randomly drawn without replacement.
What is - False - It is with replacement.
The three components of all GLMs
What are linear, link and random?
This is the reason my model has been running for the last 6 hours with no end in sight.
What is my 𝛼 too small?
Order you write common commands in an SQL query
What are:
SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT?
Deep trees suffer from error due to this, while shallow trees usually suffer from error due to that
What are variance and bias, respectively
This is why ensemble methods often perform better than non-ensemble methods
What is greatly reduces overfitting by using multiple models?
A type of regression we'll use to predict/model a discrete value between 0 and ∞
What is Poisson Regression?
A common tuning parameter that guarantees termination but not necessarily conversion of my model.
What is max_iterations?
One is used before GROUP BY and the other is used together and after GROUP BY
What are WHERE and HAVING, respectively?
These hyperparemeters can be used to reduce decision trees tendency to overfit
What are max_depth, min_samples_split and max_leaf_nodes?
These are the three differences between Bagging and boosting
What are:
Parallel vs sequential
Aggregate results vs learning from each model in turn
Random subsets of data make each model unique vs learning from the mistakes of the last model to make the next one better?
A type of regression we'll use to predict/model a continuous value between 0 and ∞
What is Gamma Regression?
The general name for what gradient descent is trying to minimize.
What is the loss function? (or what is the cost function)
1st returns records with matching values in both tables; the 2nd returns all records with a match in either left or right table
What are INNER and OUTER JOIN, respectively