This metric tells us how much of the variation in the dependent variable is explained by the regression model.
What is R-squared?
In Poisson regression, outputs must be these types of values.
What are non-negative integers (counts)?
The output of logistic regression is always between these two values.
What are 0 and 1?
Before regression, missing values must be handled by either dropping rows or doing this.
What is imputing (filling with mean/median/mode)?
A store wants to estimate the number of customers arriving each hour.
What is Poisson regression?
The assumption of linear regression is that the data can be represented by this.
What is a straight line (line of best fit)?
Poisson regression assumes that each event happens in this way, meaning the occurrence of one event does not affect another.
What is independence (independent events)?
Logistic regression is often evaluated using this type of matrix that compares predicted vs actual outcomes.
What is a confusion matrix?
Converting “Clean”, “clean”, and “CLEAN” into the same value is an example of this step.
What is standardizing categorical data?
A company wants to predict house prices using size and number of bedrooms. Which regression should they use?
What is linear regression?
If residuals are randomly scattered around zero, this indicates that the model is doing this.
What is fitting the data well?
A key assumption of Poisson regression is that the mean and this measure of spread are equal.
What is the variance?
The x-axis variable in logistic regression can be continuous, but the y-axis variable must be this type.
What is binary (categorical)?
Changing “Yes/No” to 1/0 is an example of this kind of conversion.
What is encoding categorical variables?
A health researcher predicts the probability of a patient having diabetes (yes/no) based on lifestyle data.
What is logistic regression?
This regression is an extension of linear regression that adds higher-order terms.
What is polynomial regression?
If the variance of the count data is much higher than the mean, the dataset suffers from this problem.
What is overdispersion?
Logistic regression can be extended to handle multiple categories (e.g., choosing red, blue, or green). This is called what?
What is multinomial logistic regression?
Dropping unnecessary variables helps reduce this problem, where features are highly related to each other.
What is multicollinearity?
A scientist models bacteria growth, which speeds up rapidly after some point.
What is polynomial regression?
Linear regression is not appropriate when the dependent variable is this kind of outcome.
What is categorical/binary data?
f variance is much larger than the mean in count data, you may need this alternative model.
What is negative binomial regression?
A confusion matrix shows predictions vs. actuals. The bottom-right cell represents this type of prediction.
What is a true positive?
If 15% of rows in a dataset are missing the target value (like house price), the best option is usually this.
What is dropping those rows?
A sleep study shows fatigue decreases with more hours of sleep, but levels off after 8 hours. Which regression should they use?
What is logistic regression?