Random Forests Jeopardy

Sections 1-4

Sections 5-7

Sections 8-10

Sections 11-13

100

True or False: Using random features makes the predictions relatively robust to outliers and noise

True

100

True or False: Random forests can only be used for classification tasks, not for regression

False

100

True or False: In random forests, each decision tree in the ensemble votes to determine the final prediction outcome

True

100

True or False: Random forests prevent overfitting by using the same subset of data for each tree in the ensemble

False

200

What is the goal in constructing a random forest?

a. Minimize correlation, maximize strength

b. Maximize correlation, minimize strength

a. Minimize correlation, maximize strength

200

Why is it important to search for the best split among the generated linear combinations in the Forest-RC procedure?

a. To reduce the number of features used in the model

b. To find the combination that provides the most useful separation of the data

c. To ensure that all possible combinations are tested

d. To randomly select a split without considering performance

b. To find the combination that provides the most useful separation of the data

200

Why is adaboost more sensitive to noise than Random forests?
a. Random Forest is newer
b. Adaboost isn't more sensitive to noise than Random Forests
c. Random Forest adds weight to noise, where Adaboost does not
d. Adaboost adds weight to noise, where Random Forest does not

What is the Great Flood?

200

How are random vectors used in the process of creating random forests?

a. faster runtime

b. splits into more specific decision trees to increase the different types

c. narrows down the number of decision trees needed

d. to use the most optimal algorithm

b. splits into more specific decision trees to increase the different types

300

What is adaboost?
a. A popular machine learning ensemble method that combines multiple weak learners to create a more accurate model
b. A clustering algorithm used in unsupervised learning
c. A deep learning algorithm used for natural language processing

a. A popular machine learning ensemble method that combines multiple weak learners to create a more accurate model

300

How does the Forest-RC procedure differ from traditional feature selection methods in decision forests?

a. It selects features based on their correlation with the target variable

b. It creates new features by randomly combining input variables with weighted coefficients

c. It removes weak features before training the model

d. It only works with categorical variables

b. It creates new features by randomly combining input variables with weighted coefficients

300

Random Forests are able to work with weak inputs if …

a. The correlation is low

b. Random Forest is not able to work with weak inputs

c. The correlation is high

d. Am I supposed to know this?

a. The correlation is low

300

What does the tree predictor, h(x,Θ), output in random forest regression?

a. strings binary

b. binary values

c. numerical values

d. int values

c. numerical values

400

How are random vectors used in the process of creating random forests?
a. faster runtime
b. splits into more specific decision trees to increase the different types
c. narrows down the number of decision trees needed
d. to use the most optimal algorithm

b. splits into more specific decision trees to increase the different types

400

Why is the choice of F determined by the out-of-bag estimate in the Forest-RC method?

a. To ensure that the selected F value maximizes model performance without excessive correlation

b. Because the training set is too small to determine F otherwise

c. To make sure F is always equal to M, the total number of inputs

d. To guarantee that each input variable is used equally

a. To ensure that the selected F value maximizes model performance without excessive correlation

400

How can we estimate the predictive accuracy of RF mechanisms?

a. Weights and noise

b. Out-of-bag estimates and reruns

c. Reruns and Bayes Error

d. Currently no method

b. Out-of-bag estimates and reruns

400

The empirical analysis of random forest regression revealed that

a. You can optimize performance using both the number of features and their correlation strength

b. When you add more features, the runtime is slower and performance decreases

c. You can decrease error by adding more decision trees

d. Random forest regression has a large margin of error in general

a. You can optimize performance using both the number of features and their correlation strength