These models, like CARTs, don't make any assumptions on the distribution of the data. In addition, features are considered independently.
What are non-parametric models?
This is the main philosophy behind ensemble methods.
What is using multiple models to make a better averaged model? (also accepted: Wisdom of the Crowds)
This term describes the value a model finds when it converges but it's not the optimum result.
What is local minimum?
Used to pick column from a table.
What is SELECT?
These are the criteria that classification/regression decision trees use, respectively.
What are gini index and MSE?
This is building several parallel models on random bootstrapped samples, then averaging the predictive power of each model.
What is bagging (bootstrap aggregating)?
The reason why an algorithm does not converge.
What is 𝛼 (step size) is too big?
Allows you to select columns by a condition.
What is WHERE?
This is the gini index value when the classes are perfectly balanced in a node, while this is the gini index when there is only one class represented in a node, respectively.
What are 0.5 and 0, respectively?
True or False: When fitting a random forest classifier, the sample of observations is randomly drawn without replacement.
What is "False"? Random observations are taken with replacement.
This is the reason my model has been running for the last 6 hours with no end in sight.
What is 𝛼 (step size) is too small?
This is the order of operations of a SQL query (the SQL commands! Not the mnemonic).
What are:
SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY, LIMIT?
Deep trees suffer from this kind of error, while shallow trees usually suffer from that kind of error.
What are variance and bias, respectively?
This is why ensemble methods often perform better than non-ensemble methods.
What is they greatly reduces overfitting by using multiple models?
A common tuning parameter that guarantees termination but not necessarily convergence of my model.
What is max_iter(ations)?
One is used before GROUP BY and the other is used together with and after GROUP BY, respectively.
What are WHERE and HAVING, respectively?
These 3 hyperparameters can be used to reduce decision trees' tendency to overfit.
What are max_depth, min_samples_split and max_leaf_nodes?
These are the three main differences between bagging and boosting.
What are:
- Models built in parallel vs. sequential
- Aggregate results vs. learning from each model in turn
- Random subsets of data make each model unique vs learning from the mistakes of the last model to make the next one better
?
The technique where 𝛼 (step size) is drawn from the normal distribution and is changed at each iteration.
What is Adaptive Gradient Descent?
1st returns records with matching values in BOTH tables; the 2nd returns all records with a match in EITHER the left or right table
What are INNER and OUTER JOIN, respectively?