If you reject the null hypothesis when it’s actually true, you’ve made this type of error.
What is a type 1 error?
You present your sales forecast model to your manager. You explain that the total variation in sales is 1,000 units, and your model explains 700 units of it.
This is how much unexplained error remains in the model.
What is 300 units (SSE)?
A p-value of 0.048 is found in a two-tailed test at α = 0.05. This is the meaning.
What is the result is statistically significant, and you would reject the null hypothesis.
A company collects sample data on weekly hours worked from 36 employees. The sample mean is 41 hours, and the sample standard deviation is 4.8 hours.
This is the standard error of the mean.
SE= s/ square root of n
what is 0.8
This is the crucial step that comes before collecting data when building a model.
What is formulate the data?
This value represents the probability of observing your sample results — or something more extreme — assuming the null hypothesis is true. Misinterpreting it as the probability that the null is true is a common mistake.
What is the p-value?
This problem occurs when residuals are correlated across time periods, often seen in time series models.
A business analyst wants to estimate the average number of hours employees at a tech company work per week. A random sample of 36 employees reports a mean of 44 hours, with a standard deviation of 6 hours. This is the 95% confidence interval for the true average weekly work hours of all employees.
What is
The 95% confidence interval is (42.04, 45.96).
We are 95% confident that the true average number of hours worked per week is between 42.04 and 45.96 hours.
You fail to reject the null hypothesis in a test comparing two email marketing strategies. Later, it turns out Strategy B really was better, but the test didn’t detect the difference.
This type of error was made, and this could reduce the chance of it happening again.
What is a Type II Error, and you could reduce it by increasing the sample size or increasing the power of the test.
Even if the population of cat treat preferences is not normally distributed, this theorem tells us that the sampling distribution of the sample mean will be approximately normal if the sample size is large enough.
What is the Central Limit Theorem?
A coffee shop claims that its new training program improves employee sales. Historically, employees sold an average of $200/day. After training, a random sample of 25 employees averaged $210/day with a standard deviation of $20. You want to test, at the 5% significance level, whether the training increased sales.
This value is the test statistic.
What is 2.5?
Your company increases ad spending month over month to boost online sales. At first, sales go up, but after a certain point, additional spending seems to result in smaller gains, and eventually no gains at all. A residual plot of your linear model shows a clear pattern. This model best captures your data.
What is a curvilinear (nonlinear) regression model?
T/F: A p-value of 0.04 means there is a 96% chance the alternative hypothesis is true.
What is:
False.
A p-value tells you the probability of getting your results (or more extreme) assuming the null hypothesis is true — not the probability that the alternative is true.
This value is the average of the squared residuals in a regression model and is used to measure how well the model fits the data.
What is MSE?
True or false:
Histograms should be bounded and have no empty bins.
What is
True?
This term refers to the number of values in a calculation that are free to vary, and it affects the shape of the t-distribution used in hypothesis testing.
What is degrees of freedom?
Your model has a very high R² value, but when you use it to predict new data, the results are wildly inaccurate. This is the most likely explanation.
What is overfitting?
If the relationship between population and revenue is modeled by the simple single factor regression model Y=0.48x+75.2, the correlation coefficient is likely between this.
What is:
Between zero and positive one.
True or False:
In a well-fitting regression model, residuals should show a clear upward or downward pattern.
what is:
False.
Residuals should appear randomly scattered — patterns indicate a model misfit or missing variable.
These are fields of data analytics.
What are:
Descriptive
Inferential
Predictive?
A marketing analyst performs a two-tailed t-test on whether a new ad campaign changes customer conversion rates. They report a statistically significant result (p = 0.03) but then say, “This proves the campaign caused the increase.” Identify two major flaws in this interpretation.
What are
Statistical significance does not prove causation — there may be confounding variables or bias.
A p-value of 0.03 means the result is unlikely under the null, not that the alternative is definitely true or that the effect is practically important.
If a residual plot fans outward as the fitted values increase, this assumption is violated and this problem is caused.
What is heteroscedasticity, which leads to biased standard errors and unreliable p-values.
The F-test in regression is used to do this.
What is test whether the overall regression model is statistically significant — i.e., whether at least one predictor variable explains variation in the response variable?
You build a multiple regression model with three predictors. The overall R² is high, but two predictors have very high p-values and change signs when you add or remove variables.
This issue might be present, and this is why is it a problem.
What is multicollinearity — it makes coefficient estimates unstable and unreliable, even when the overall model looks strong?
You're building a regression model and want to include the categorical variable "Region," which has three values: East, West, and South.
This is how should you represent this variable in your model.
What is create two dummy variables (e.g., West and South), treating one category (East) as the reference group, to avoid the dummy variable trap?