T-Testing my patience
Regress to Impress
Your (statistically) significant other
What did the DJ name his son? Error Error
Strictly Business *well everything kinda is*
100

If you reject the null hypothesis when it’s actually true, you’ve made this type of error.

What is a type 1 error?

100

You present your sales forecast model to your manager. You explain that the total variation in sales is 1,000 units, and your model explains 700 units of it.

This is how much unexplained error remains in the model.


What is 300 units (SSE)?

100

A p-value of 0.048 is found in a two-tailed test at α = 0.05. This is the meaning.

What is the result is statistically significant, and you would reject the null hypothesis.

100

A company collects sample data on weekly hours worked from 36 employees. The sample mean is 41 hours, and the sample standard deviation is 4.8 hours.
This is the standard error of the mean.

SE= s/ square root of n

what is 0.8

100

This is the crucial step that comes before collecting data when building a model.

What is formulate the data?

200

This value represents the probability of observing your sample results — or something more extreme — assuming the null hypothesis is true. Misinterpreting it as the probability that the null is true is a common mistake.

What is the p-value?

200

This problem occurs when residuals are correlated across time periods, often seen in time series models.

What is autocorrelation?
200

A business analyst wants to estimate the average number of hours employees at a tech company work per week. A random sample of 36 employees reports a mean of 44 hours, with a standard deviation of 6 hours. This is the 95% confidence interval for the true average weekly work hours of all employees.

What is 

The 95% confidence interval is (42.04, 45.96).
We are 95% confident that the true average number of hours worked per week is between 42.04 and 45.96 hours.

200

You fail to reject the null hypothesis in a test comparing two email marketing strategies. Later, it turns out Strategy B really was better, but the test didn’t detect the difference.

This type of error was made, and this could reduce the chance of it happening again.

What is a Type II Error, and you could reduce it by increasing the sample size or increasing the power of the test.

200

Even if the population of cat treat preferences is not normally distributed, this theorem tells us that the sampling distribution of the sample mean will be approximately normal if the sample size is large enough.

What is the Central Limit Theorem?

300

A coffee shop claims that its new training program improves employee sales. Historically, employees sold an average of $200/day. After training, a random sample of 25 employees averaged $210/day with a standard deviation of $20. You want to test, at the 5% significance level, whether the training increased sales.

This value is the test statistic.

What is 2.5?

300

Your company increases ad spending month over month to boost online sales. At first, sales go up, but after a certain point, additional spending seems to result in smaller gains, and eventually no gains at all. A residual plot of your linear model shows a clear pattern. This model best captures your data.

What is a curvilinear (nonlinear) regression model?

300

T/F: A p-value of 0.04 means there is a 96% chance the alternative hypothesis is true.

What is:

False.
A p-value tells you the probability of getting your results (or more extreme) assuming the null hypothesis is true — not the probability that the alternative is true.

300

This value is the average of the squared residuals in a regression model and is used to measure how well the model fits the data.

What is MSE?

300

True or false:

Histograms should be bounded and have no empty bins.

What is

True?

400

This term refers to the number of values in a calculation that are free to vary, and it affects the shape of the t-distribution used in hypothesis testing.

What is degrees of freedom?

400

Your model has a very high R² value, but when you use it to predict new data, the results are wildly inaccurate. This is the most likely explanation.

What is overfitting?

400

If the relationship between population and revenue is modeled by the simple single factor regression model Y=0.48x+75.2, the correlation coefficient is likely between this.

What is:

Between zero and positive one.

400

True or False:
In a well-fitting regression model, residuals should show a clear upward or downward pattern.

what is:

False.
Residuals should appear randomly scattered — patterns indicate a model misfit or missing variable.

400

These are fields of data analytics.

What are:

Descriptive 

Inferential

Predictive?

500

A marketing analyst performs a two-tailed t-test on whether a new ad campaign changes customer conversion rates. They report a statistically significant result (p = 0.03) but then say, “This proves the campaign caused the increase.” Identify two major flaws in this interpretation.

What are

  1. Statistical significance does not prove causation — there may be confounding variables or bias.

  2. A p-value of 0.03 means the result is unlikely under the null, not that the alternative is definitely true or that the effect is practically important.

500

If a residual plot fans outward as the fitted values increase, this assumption is violated and this problem is caused.

What is heteroscedasticity, which leads to biased standard errors and unreliable p-values.

500

The F-test in regression is used to do this.

What is test whether the overall regression model is statistically significant — i.e., whether at least one predictor variable explains variation in the response variable?

500

You build a multiple regression model with three predictors. The overall R² is high, but two predictors have very high p-values and change signs when you add or remove variables.

This issue might be present, and this is why is it a problem.

What is multicollinearity — it makes coefficient estimates unstable and unreliable, even when the overall model looks strong?

500

You're building a regression model and want to include the categorical variable "Region," which has three values: East, West, and South.

This is how should you represent this variable in your model. 

What is create two dummy variables (e.g., West and South), treating one category (East) as the reference group, to avoid the dummy variable trap?

M
e
n
u