Pandas & Plotting
Linear Regression
Model Evaluation
Stats
100

This is the kind of visualization I would use to look at the distribution of house list prices.

What is a histogram?

100

This is what m and b are in the formula for a line:

y = mx + b

What is...

m = slope

b = intercept

100

This is the range of possible R^2 values.

[0,1]

100

Finish the sentence: "When p is low..."

What is: "H0 has got to go"?

200

I have a dataframe called "messages" with 4 columns: "sender", "recipient", "time sent", "contents" (all strings). This is the code that gives us only the contents and recipient of all messages from the sender "Daniel".

What is...

messages.loc[messages["sender"] == "Daniel", ["contents", "recipient"]]

200

These are the 4 assumptions of Simple Linear Regression.

What are:

  1. Linearity: Y and X must have an approximately linear relationship.
  2. Independence: Errors (residuals)  and  must be independent of one another for any .
  3. Normality: The errors (residuals) follow a Normal distribution with mean 0 -- N(0,σ)
  4. Equality of Variances (homoscedasticity of errors): The errors (residuals) should have a roughly consistent pattern, regardless of the value of X. (There should be no discernable relationship between X and the residuals.)
200

These are the 4-5 evaluation metrics we have learned for linear regression. (If it's an acronym, write out the words!)

What are RSS/SSE (residual sum of squares/sum of squared error), MSE (mean squared error), RMSE (root mean squared error), R^2, and adjusted R^2?

200

I am a researcher and I want to know how people's math test scores differ given their amount of sleep the preceding two nights. This is my null and alternative hypothesis.

What are:

H0: There is no difference in math scores between the two groups.

HA: There is a difference in math scores between the two groups.

?

300

This is a scenario in which I might use a pie chart.

Never

300

This is the code to instantiate and fit a linear regression model in sklearn, assuming I have defined X and y.

What is:

lr = LinearRegression()

lr.fit(X, y);

300

A high bias model has [high/low] accuracy and is [over/under]fit. A high variance model has [high/low] accuracy and is [over/underfit].

What are: low, under, high, over?

300

This is the Central Limit Theorem.

What is the theorem that states that the sampling distribution of the sample means approaches a normal distribution ( N(μ, σ / √(n)) ) as the sample size gets larger — no matter what the shape of the population distribution?

400

I have a DataFrame called "top_hits" with 3 columns: "genre" (string), "song title" (string), and "plays" (int). This is the code to give me the average number of plays per genre.

What is...

top_hits.groupby(['genre'])['plays'].mean()

400

This is the additional assumption we make for multiple linear regression on top of the 4 SLR assumptions.

What is Independence of Predictors: The independent variables Xi and Xj must be independent of one another for any i != j.

400

My multiple linear regression model has an R^2 value of .63. This is my interpretation of that value.

What is our model accounts for 63% of the variance in y from the mean?

400

This is in interpretation of the confidence interval for the true average weight of an adult male grizzly bear with the 2.5th percentile of our sample = 400 lbs and the 97.5th percentile of our sample = 790 lbs.

"I am 95% confident that the true average weight of adult male grizzly bears is between 400 and 790 lbs."

M
e
n
u