True or False: Using random features makes the predictions relatively robust to outliers and noise
True
True or False: Random forests can only be used for classification tasks, not for regression
False
True or False: In random forests, each decision tree in the ensemble votes to determine the final prediction outcome
True
True or False: Random forests prevent overfitting by using the same subset of data for each tree in the ensemble
False
What is the goal in constructing a random forest?
a. Minimize correlation, maximize strength
b. Maximize correlation, minimize strength
a. Minimize correlation, maximize strength
Why is it important to search for the best split among the generated linear combinations in the Forest-RC procedure?
a. To reduce the number of features used in the model
b. To find the combination that provides the most useful separation of the data
c. To ensure that all possible combinations are tested
d. To randomly select a split without considering performance
b. To find the combination that provides the most useful separation of the data
Why is adaboost more sensitive to noise than Random forests?
a. Random Forest is newer
b. Adaboost isn't more sensitive to noise than Random Forests
c. Random Forest adds weight to noise, where Adaboost does not
d. Adaboost adds weight to noise, where Random Forest does not
What is the Great Flood?
How are random vectors used in the process of creating random forests?
a. faster runtime
b. splits into more specific decision trees to increase the different types
c. narrows down the number of decision trees needed
d. to use the most optimal algorithm
b. splits into more specific decision trees to increase the different types
What is adaboost?
a. A popular machine learning ensemble method that combines multiple weak learners to create a more accurate model
b. A clustering algorithm used in unsupervised learning
c. A deep learning algorithm used for natural language processing
a. A popular machine learning ensemble method that combines multiple weak learners to create a more accurate model
How does the Forest-RC procedure differ from traditional feature selection methods in decision forests?
a. It selects features based on their correlation with the target variable
b. It creates new features by randomly combining input variables with weighted coefficients
c. It removes weak features before training the model
d. It only works with categorical variables
b. It creates new features by randomly combining input variables with weighted coefficients
Random Forests are able to work with weak inputs if …
a. The correlation is low
b. Random Forest is not able to work with weak inputs
c. The correlation is high
d. Am I supposed to know this?
a. The correlation is low
What does the tree predictor, h(x,Θ), output in random forest regression?
a. strings binary
b. binary values
c. numerical values
d. int values
c. numerical values
How are random vectors used in the process of creating random forests?
a. faster runtime
b. splits into more specific decision trees to increase the different types
c. narrows down the number of decision trees needed
d. to use the most optimal algorithm
b. splits into more specific decision trees to increase the different types
Why is the choice of F determined by the out-of-bag estimate in the Forest-RC method?
a. To ensure that the selected F value maximizes model performance without excessive correlation
b. Because the training set is too small to determine F otherwise
c. To make sure F is always equal to M, the total number of inputs
d. To guarantee that each input variable is used equally
a. To ensure that the selected F value maximizes model performance without excessive correlation
How can we estimate the predictive accuracy of RF mechanisms?
a. Weights and noise
b. Out-of-bag estimates and reruns
c. Reruns and Bayes Error
d. Currently no method
b. Out-of-bag estimates and reruns
The empirical analysis of random forest regression revealed that
a. You can optimize performance using both the number of features and their correlation strength
b. When you add more features, the runtime is slower and performance decreases
c. You can decrease error by adding more decision trees
d. Random forest regression has a large margin of error in general
a. You can optimize performance using both the number of features and their correlation strength