Key Concepts
Assumptions
Modeling
Predictions/Statistical Tests
Coding
100

The response variable in logistic regression is this type. 

What is a binary variable (0/1; Yes/No; ect.)?

100

Answer to the question: Does logistic regression assume a linear relationship between X and probability? 

What is no?

100

This is the logit form of the logistic regression model.

What is log(pi/(1-pi)) = beta_0 + beta_1X?

100

If a logistic regression model predicts pi=0.6, where are response variable is equal to 1 if a student passes an exam and 0 if not. This is the interpretation of pi.

What is: there is a 60% chance that the student will pass the exam?

100

Run the following code in RStudio: 

data(mtcars)

mtcars$high_mpg <- ifelse(mtcars$mpg > 20, 1, 0) 

head(mtcars)

We would like to make a logistic regression model to predict the probability that a car will have high mpg (over 20) based on its weight (coded as wt). This is the value of the slope (b_0) in the logit form of the model.

What is 19.560?

200

Answer to the question: can the explanatory variable, X, in single logistic regression be categorical, and if so, how is it used in the model?

What is: Yes. a categorical X can be used by coding categories as indicator variables (0/1) so the model compares groups’ log-odds of the outcome?

200

Answer to the question: Is normality of errors required for logistic regression? 

What is no?

200

This is the probability form of the logistic regression model.

What is pi = (e^(b_0 + b_1X))/(1+e^(b_0 + b_1X))?

200

You fit a single logistic regression model and get a p-value of 0.03 for the slope coefficient. This is what the conclusion that we would (at a significance level of 0.05) make based on this evidence.

What is: reject the null hypothesis that beta_1 = 0/there is statistically significant evidence at the 0.05 level that X is associated with the log-odds of the outcome?

200

Run the following code in RStudio: 

data(mtcars)

mtcars$high_mpg <- ifelse(mtcars$mpg > 20, 1, 0) 

head(mtcars)

We would like to make a logistic regression model to predict the probability that a car will have high mpg (over 20) based on its weight (coded as wt). Is the slope for wt statistically significant? Why?

What is yes because the p-value is less than 0.05?

300

This is the transformation of the probability Y = 1 that is modeled as a linear function of X?

What is log(pi/(1-pi))?

300

In simple linear regression the relationship between X and Y is assumed to be linear.
In logistic regression, this quantity is assumed to be linear in X instead.

What is the log-odds of the response probability?

300

If log(odds) = 0.99, this is the value of pi?

What is 0.73?

300

A student predicts the probability of passing an exam is 0.85. These are the corresponding odds and log-odds.

What is: the odds are 5.67 and the log(odds) are 1.73?

300

Run the following code in RStudio: 

data(mtcars)

mtcars$high_mpg <- ifelse(mtcars$mpg > 20, 1, 0) 

head(mtcars)

We would like to make a logistic regression model to predict the probability that a car will have high mpg (over 20) based on its weight (coded as wt).

This is the value of the slope coefficient for wt when we convert it to an odds ratio. 

What is 0.001676579?

400

This is what the slope, B_1, represents in logistic regression.

What is the change in log odds of Y=1 for a one unit increase in X?

400

Answer to the question: Why is it important that observations be independent in logistic regression? 

What is: because the model assumes that each outcome provides separate information. Dependence between observations can bias coefficient estimates and standard errors, making inference invalid?

400

If pi = 0.56, this is the value of the log(odds).

What is 0.24?

400

This is the way that we can assess whether our logistic regression model fits the data better than a null model with no predictors.

What is by comparing the model deviance (or -2 log-likelihood) of the fitted model to the null model, often using a chi-square test? (G)

400

Run the following code in RStudio: 

data(mtcars)

m

tcars$high_mpg <- ifelse(mtcars$mpg > 20, 1, 0) 

head(mtcars)

We would like to make a logistic regression model to predict the probability that a car will have high mpg (over 20) based on its weight (coded as wt).

This is the predicted probability of a car having high mpg if it weighs 3 tons.

What is 0.685?

500

This is the reason why we usually interpret the slope coefficient, beta_1, by using an exponential instead of using its estimated value?

What is because using an exponential converts the coefficient from change in log-odds to a multiplicative change in odds (odds ratio), which is easier to interpret?

500

Name at least one reason why we wouldn't want to use linear regression for predicting a binary outcome.

What is we would get probabilities less than 0 or more than 1 (which is not correct), errors are not normally distribution, non-constant variance (different assumptions)?

500

Answer to the question: if a single logistic regression model gives a slope coefficient, beta_1 = 0.8, how would you explain this to someone who doesn’t understand log-odds?

What is: for each one-unit increase in X, the odds of the event happening are multiplied by about 2.2.

500

In single logistic regression, these are the null and alternative hypotheses when testing whether the slope beta_1 is associated with the outcome.

What is: H_0: beta_1 = 0 (the predictor X has no effect on the log odds of the response)

H_A: beta_1 does not equal 0 (the predictor X does have an effect on the log odds of the response)?

500

Run the following code in RStudio: 

data(mtcars)

m

tcars$high_mpg <- ifelse(mtcars$mpg > 20, 1, 0) 

head(mtcars)

We would like to make a logistic regression model to predict the probability that a car will have high mpg (over 20) based on its weight (coded as wt).

This is the log odds of a car having high mpg if it weighs 5 tons.

What is -12.005?