A variable that places individuals into categories instead of numerical values.
What is a categorical variable?
For any probability distribution, this value must always equal 1.
What is the sum of all probabilities?
When sigma is unknown and n is small, this distribution must be used.
What is the t-distribution?
Rejecting a true null hypothesis is known as this type of error.
What is a Type I error?
This value measures how much variation in y is explained by x in regression.
What is R-squared?
This statistic describes how spread out data are and is the square root of variance.
What is the standard deviation?
This theorem guarantees that sample means become approximately normal when n is large.
What is the Central Limit Theorem?
Increasing this will make a confidence interval narrower.
What is the sample size?
This value tells us the probability of getting our results or more extreme, assuming H₀ is true.
What is the p-value?
The slope in simple linear regression describes the change in y for each one-unit increase in this.
What is x (the independent variable)?
This graphical tool displays the five-number summary visually.
What is a boxplot?
This term refers to the standard deviation of a sampling distribution.
What is the standard error?
When estimating a population mean with known sigma, this distribution is used.
What is the standard normal (z) distribution?
The formula for a two-sample z test for difference in means requires this assumption about sigma.
What is that the population standard deviations are known?
In ANOVA for a factorial design, this term measures whether the effect of one factor depends on the level of another factor.
What is interaction?
A professor records the number of hours 8 students spent preparing for a statistics exam:
4, 5, 5, 6, 7, 9, 10, 10.
She claims the distribution is right-skewed.
Based on the data, is her claim reasonable?
The professor’s claim is reasonable but mild.
Evidence: mean=7.00, median=6.5 → mean > median (suggests right skew). Also the upper tail has larger values (9, 10, 10) while the lower tail only has one small value (4) — supports mild right skew. Not a dramatic skew, but direction is correct.
The average time students spend on campus per day is μ = 6.2 hours with σ = 1.8 hours.
A sample of n = 50 students is selected.
What is the probability that the sample mean exceeds 6.5 hours?
SE = 1.8 / √50 = 1.8 / 7.071 = 0.2546.
z = (6.5 − 6.2) / 0.2546 ≈ 1.178.
P(mean > 6.5) = 0.1194 (~12%)
A 95% confidence interval for the average time students spend waiting in the campus dining hall line is (6.8, 9.4) minutes.
A student claims, “This means that 95% of students wait between 6.8 and 9.4 minutes.”
Explain why this interpretation is incorrect and give the correct interpretation.
Correct interpretation: We are 95% confident that the population mean waiting time lies between 6.8 and 9.4 minutes. (The interval refers to the parameter, not to individual students.)
A campus gym claims that the average treadmill running time for students is 28 minutes.
A sample of n = 36 students produces a mean of 26.4 minutes with σ = 6 minutes.
Using a 5% significance level, perform the hypothesis test and state whether the gym’s claim should be rejected.
Work:
SE = σ/√n = 6/6 = 1.
z = (26.4 − 28) / 1 = −1.6.
Critical Value z = -1.96.
-1.6 < -1.96 → This is not true → Do not reject H₀.
p-value ≈ 0.118 < 0.05 → This is not true → Do not reject H₀.
A regression model was used to predict exam score from hours studied, producing the equation:
ŷ = 62 + 4.5x
A student who studied 6 hours scored 88.
What is the residual, and what does it tell you about the model’s prediction for this student?
Work: Predicted ŷ = 62 + 4.5(6) = 62 + 27 = 89. Residual = observed − predicted = 88 − 89 = −1.
Interpretation: The student scored 1 point below what the model predicted.
Two dorms collected data on shower water usage (in gallons) for a random sample of residents:
Dorm A: 18, 19, 20, 22, 25, 27
Dorm B: 11, 12, 14, 19, 20, 40
Describe how the two dorms differ in center, spread, and shape, and identify an outlier if one exists.
Center: Dorm A mean = 131/6 ≈ 21.83; Dorm B mean = 116/6 ≈ 19.33. Dorm A has a slightly higher center.
Spread / Shape: Dorm A range = 9 (18–27), fairly symmetric; Dorm B range = 29 (11–40) and is heavily right-skewed due to the 40.
Outlier: 40 in Dorm B is an outlier (much larger than other values).
Interpretation: Dorm B has similar center but much larger variability and an outlier; Dorm A is more tightly clustered.
Daily steps among students follow a highly skewed distribution with μ = 5,500 and σ = 1,900.
You take a sample of n = 42 students.
Is the sampling distribution of the sample mean approximately normal? Explain clearly why or why not, using CLT conditions.
Yes — the sampling distribution of the sample mean is approximately normal.
Reason: n = 42 is reasonably large (commonly n ≥ 30), so by the Central Limit Theorem the distribution of the sample mean will be close to normal even though the original daily-steps distribution is skewed. (If extreme skew or heavy outliers existed, you’d be slightly more cautious, but n = 42 is adequate.)
A city planner wants the average commuting time (minutes) for Cityville residents estimated within +2 minutes. A pilot study suggests sigma =12 minutes. Use a 95% confidence level.
What sample size (n) is required?
Work:
z for 95% = 1.96
E = 2
sigma = 12
n = (z*²· sigma²) / E² = (1.96² · 12²) / 2² = 138.29 → 139 (round up).
Answer: 139 respondents.
A coach compares free-throw shooting between two players over the season. Player 1 (sample of 50 attempts) has an average success rate of 0.780.
Player 2 (sample of 55 attempts) has an average of 0.740. Historical population standard deviations are 0.060 for Player 1 and 0.050 for Player 2.
At the 5% significance level, is there evidence that Player 1’s true free-throw percentage is greater than Player 2’s?
Work:
Zobs = 3.692
CV: -norm.s.inv(0.05) -> 1.65
Z> CV -> Reject H0
PV: 1-norm.s.dist(3.692,true) -> 0.0001
PV < 0.05 -> Reject Ho
Campus Dining tests whether music type (pop, classical, no music) and meal price (low, medium) affect the number of lunch orders.
They run a factorial experiment and find a significant interaction between music type and price.
Explain what a significant interaction means in this context, and give one example of what the interaction might look like.
Answer (short, plain language): A significant interaction means the effect of music type depends on the price level — i.e., the difference in orders between music types is not constant across prices.
Example: At low price, pop music might increase lunch orders by 30 compared to no music, but at high price pop might increase orders by only 5 (or maybe even decrease them). Thus the benefit (or harm) of a music type changes with price; you cannot interpret the main effects alone without considering the interaction.