Regression & Classification Review Jeopardy Template

Simple Normal Regression

Extending Normal Regression

Logistic Regression

Naive Bayes Classification

Miscellaneous

100

In a Bayesian regression model Y = beta_0 + beta_1X + ε, this parameter represents the expected value of Y when the predictor X is equal to zero.

What is beta_0?

100

A researcher is studying Crop Yield. They find that while Rainfall is a good predictor, it doesn't tell the whole story, as Fertilizer Use also significantly impacts the harvest. Identify the primary advantage of using Multiple Linear Regression (MLR) in this scenario instead of running two separate Simple Linear Regressions.

What is: To account for the simultaneous impact of multiple predictors (or to "control for" other variables)?

100

If the odds of rain tomorrow are 2, what is the probability that it will rain tomorrow?

What is 2/3?

100

Naive Bayes classification is based on Bayes' Rule. To predict the class of an observation, the algorithm calculates the posterior probability for each category and assigns the observation to the category with the highest probability.

Identify the three components (probabilities) of Bayes' Rule that are multiplied or divided to calculate this posterior.

What are the prior, likelihood, and normalizing constant?

100

In Bayes' rule, this term represents how likely the observed data are under a given parameter value.

What is the likelihood?

200

In a Bayesian context, we do not view the slope beta_1 as a single fixed number, but rather as this, which allows us to quantify our uncertainty before and after seeing data.

What is a random variable (or probability distribution)?

200

A researcher models House Price based on Square Footage and Location (Urban vs. Rural). They find that the price increase for every additional square foot is exactly the same regardless of whether the house is in an urban or rural location.

If the lines for Urban and Rural areas are parallel on a scatterplot, state whether an interaction term is needed and what this says about the "slope" of square footage.

What is: No, an interaction term is not needed. It means the slope for square footage is constant across both locations.

200

In logistic regression, we have the condition that Y_i can take values either 0 or 1. What probability model does Y_i|pi_i follow?

What is the Bernouilli probability model?

200

The "Naive" in Naive Bayes comes from a simplifying assumption that drastically reduces the number of parameters needed for the model.

Identify this core assumption regarding the relationship between the predictor variables (X_1, X_2, ..., X_n) given the class label (Y).

What is the assumption of conditional independence? (Or: The predictors are independent given the class.)

200

This term describes a prior that is "flat" or has a very large scale, allowing the data to speak for itself.

What is a vague (or weakly informative) prior?

300

A posterior summary shows beta_1 has a median of −2.1 with a 95% credible interval of [−4.3, −0.2]. In one sentence, interpret this in the context of predicting plant growth (Y) from pollution level (X).

There is a 95% probability that each one-unit increase in pollution level is associated with a decrease in plant growth of between 0.2 and 4.3 units.

300

A researcher is modeling the Starting Salary of recent graduates.

Model A uses only GPA and Major. It is consistent but tends to under-represent the complexity of the job market (High Bias).
Model B uses GPA, Major, Zip Code, Favorite Color, High School Mascot, and 20 other variables. It fits the training data perfectly (High Variance).

Identify which model is "overfitting" the data and describe how its predictions will likely change when applied to a new group of graduates.

What is Model B? Its predictions will be much less accurate (or have much higher error) on new data.

300

Let Y represent a binary variable where Y = 1 indicates that a person is a cat preferer and Y = 0 indicates that a person is a dog preferer. We fit a logistic regression model where X = height. We construct a confusion matrix to summarize the results of the corresponding posterior classifications. We find that the model accurately identifies 10 people as being dog preferers and correctly identifies 2 people as being cat preferers. There are 30 data points in total. What is the overall accuracy?

What is (2+10)/30 = 12/30 = 40%.

300

Under the assumption of conditional independence, if the probability of an email containing the word "Winner" given it is Spam is 0.60, and the probability of it containing the word "Cash" given it is Spam is 0.50, identify the joint likelihood that a Spam email contains both words.

What is 0.30? (Calculation: 0.60 * 0.50 = 0.30)

300

Write out Bayes' rule in proportional form and name each component.

What is: posterior ∝ likelihood × prior, or f(π|y) ∝ L(π|y) · f(π)

400

We are tuning a Simple Normal Regression model for the Total Time Underwater (Y, in seconds) of a Sea Otter based on the Water Depth (X, in meters, centered at the mean depth of 15m). Marine biologists provide the following prior knowledge:

-At the mean depth, an otter typically stays under for 90 seconds, though this average could realistically range from 70 to 110 seconds.

-For every 1-meter increase in depth, the dive time typically increases by 4 seconds, though this increase could be as low as 2 or as high as 6 seconds.

-At any given temperature, individual dives will tend to vary with a standard deviation of 15 seconds.

Identify the prior distributions for beta_0c, beta_1, and sigma

What is beta_0c ~ N(90, 10)

beta_1 ~ N(4, 1)

sigma~ exp(1/15)

400

Researchers model the Basal Metabolic Rate of animals based on Body Mass and Environment (0 = Captivity, 1 = Wild). The resulting model coefficients are:

beta_0 = 15

beta_1 (for mass) = 2.5

beta_2 (for environment) = -3

beta_3 (for mass times environment) = 1.2

Identify the estimated slope (rate of change) for Body Mass specifically for animals living in the Wild.

What is 3.7?

400

A researcher models the probability that a student passes a certification exam (Y=1) based on the number of Hours Studied (X). The resulting model is:

log(odds) = -2.0 + 0.5*Hours

Provide the correct interpretation of the coefficient 0.5 in terms of odds.

What is as hours studied increase by 1, the odds of passing are multiplied by about 1.6?

400

A researcher uses Naive Bayes to classify trees into two categories: Healthy or Diseased. To be safe, they decide to only label a tree as "Diseased" if the posterior probability is greater than 0.95.

If the researcher lowers this threshold to 0.50, identify whether the number of "False Positives" (Healthy trees mistakenly labeled as Diseased) will increase or decrease.

What is Increase? (By lowering the "bar" for what qualifies as Diseased, the model becomes more aggressive and will mistakenly flag more healthy trees.)

400

In an ideal MCMC chain, the autocorrelation between two successive samples is high, but does this as the distance between samples increases.

What is quickly decreases toward 0?

500

Using the built-in trees dataset, fit a centered Bayesian regression model predicting Volume based on Height using stan_glm with a Normal(30, 10) prior on the intercept, Normal(1, 0.5) prior on the slope, and Exponential(1) prior on σ. Then make a density plot for each of the parameters and indicate what the mean is for each one.

What is intercept about -75, height about 1.4, and sigma about 12?

500

Using the built-in msleep dataset in library(MASS), fit a Bayesian regression model predicting total_sleep based on brainwt and bodywt using stan_glm with a Normal(12, 5) prior on the intercept, Normal(0, 2.5, autoscale = TRUE) prior on the slopes, and Exponential(1, autoscale = TRUE) prior on σ. Then find an 80% credible interval for the intercept.

What is 9.9 to 11.4?

500

Using the climbers_sub dataset (built-in to bayesrules package), a researcher wants to predict the probability that a climber succeeds in reaching the summit of Mt. Everest based on their age. Please use R to fit a logistic regression model and comment on whether or not the chains are well-mixed. Note: you do not have to specify the priors.

What is they are well-mixed?

500

Using the coffee_ratings dataset from the bayesrules package, write the R code to fit a Naive Bayes model that classifies a coffee bean's species based on its flavor and aroma scores. Suppose that we observe a coffee bean that has a flavor score of 7 and an aroma score of 7. What species of coffee will our model predict this bean to be?

What is Arabica?

500

A researcher generates 10,000 posterior predictive draws for a new observation. 320 fall below 20 and 450 fall above 80. What is the estimated probability that a new observation falls between 20 and 80? Show your work.

What is: P(20 < y_new < 80) = 1 − (320/10000) − (450/10000) = 1 − 0.032 − 0.045 = 0.923 or 92.3%.