This is the kind of visualization I would use to look at the distribution of house list prices.
What is a histogram?
This is what m and b are in the formula for a line:
y = mx + b
What is...
m = slope
b = intercept
This is the range of possible R^2 values.
[0,1]
Finish the sentence: "When p is low..."
What is: "H0 has got to go"?
I have a dataframe called "messages" with 4 columns: "sender", "recipient", "time sent", "contents" (all strings). This is the code that gives us only the contents and recipient of all messages from the sender "Daniel".
What is...
messages.loc[messages["sender"] == "Daniel", ["contents", "recipient"]]
These are the 4 assumptions of Simple Linear Regression.
What are:
These are the 4-5 evaluation metrics we have learned for linear regression. (If it's an acronym, write out the words!)
What are RSS/SSE (residual sum of squares/sum of squared error), MSE (mean squared error), RMSE (root mean squared error), R^2, and adjusted R^2?
I am a researcher and I want to know how people's math test scores differ given their amount of sleep the preceding two nights. This is my null and alternative hypothesis.
What are:
H0: There is no difference in math scores between the two groups.
HA: There is a difference in math scores between the two groups.
?
This is a scenario in which I might use a pie chart.
Never
This is the code to instantiate and fit a linear regression model in sklearn, assuming I have defined X and y.
What is:
lr = LinearRegression()
lr.fit(X, y);
A high bias model has [high/low] accuracy and is [over/under]fit. A high variance model has [high/low] accuracy and is [over/underfit].
What are: low, under, high, over?
This is the Central Limit Theorem.
What is the theorem that states that the sampling distribution of the sample means approaches a normal distribution ( N(μ, σ / √(n)) ) as the sample size gets larger — no matter what the shape of the population distribution?
I have a DataFrame called "top_hits" with 3 columns: "genre" (string), "song title" (string), and "plays" (int). This is the code to give me the average number of plays per genre.
What is...
top_hits.groupby(['genre'])['plays'].mean()
This is the additional assumption we make for multiple linear regression on top of the 4 SLR assumptions.
What is Independence of Predictors: The independent variables Xi and Xj must be independent of one another for any i != j.
My multiple linear regression model has an R^2 value of .63. This is my interpretation of that value.
What is our model accounts for 63% of the variance in y from the mean?
This is in interpretation of the confidence interval for the true average weight of an adult male grizzly bear with the 2.5th percentile of our sample = 400 lbs and the 97.5th percentile of our sample = 790 lbs.
"I am 95% confident that the true average weight of adult male grizzly bears is between 400 and 790 lbs."