Describing the Data
Probability Playground
Confidence Intervals & Sample Sizes
Hypothesis Testing Showdown
Models & Experiments
100

A variable that places individuals into categories instead of numerical values.

What is a categorical variable?

100

For any probability distribution, this value must always equal 1.

What is the sum of all probabilities?

100

When sigma is unknown and n is small, this distribution must be used.

What is the t-distribution?

100

Rejecting a true null hypothesis is known as this type of error.

What is a Type I error?

100

This value measures how much variation in y is explained by x in regression.

What is R-squared?

200

This statistic describes how spread out data are and is the square root of variance.

What is the standard deviation?

200

This theorem guarantees that sample means become approximately normal when n is large.

What is the Central Limit Theorem?

200

Increasing this will make a confidence interval narrower.

What is the sample size?

200

This value tells us the probability of getting our results or more extreme, assuming H₀ is true.

What is the p-value?

200

The slope in simple linear regression describes the change in y for each one-unit increase in this.

What is x (the independent variable)?

300

This graphical tool displays the five-number summary visually.

What is a boxplot?

300

This term refers to the standard deviation of a sampling distribution.

What is the standard error?

300

When estimating a population mean with known sigma, this distribution is used.

What is the standard normal (z) distribution?

300

The formula for a two-sample z test for difference in means requires this assumption about sigma.

What is that the population standard deviations are known?

300

In ANOVA for a factorial design, this term measures whether the effect of one factor depends on the level of another factor.

What is interaction?

400

A professor records the number of hours 8 students spent preparing for a statistics exam:
4, 5, 5, 6, 7, 9, 10, 10.
She claims the distribution is right-skewed.
Based on the data, is her claim reasonable?

The professor’s claim is reasonable but mild.
Evidence: mean=7.00, median=6.5 → mean > median (suggests right skew). Also the upper tail has larger values (9, 10, 10) while the lower tail only has one small value (4) — supports mild right skew. Not a dramatic skew, but direction is correct.

400

The average time students spend on campus per day is μ = 6.2 hours with σ = 1.8 hours.
A sample of n = 50 students is selected.
What is the probability that the sample mean exceeds 6.5 hours?

SE = 1.8 / √50 = 1.8 / 7.071 = 0.2546.
z = (6.5 − 6.2) / 0.2546 ≈ 1.178.
P(mean > 6.5) = 0.1194 (~12%)

400

A 95% confidence interval for the average time students spend waiting in the campus dining hall line is (6.8, 9.4) minutes.
A student claims, “This means that 95% of students wait between 6.8 and 9.4 minutes.”
Explain why this interpretation is incorrect and give the correct interpretation.

Correct interpretation: We are 95% confident that the population mean waiting time lies between 6.8 and 9.4 minutes. (The interval refers to the parameter, not to individual students.)

400

A campus gym claims that the average treadmill running time for students is 28 minutes.
A sample of n = 36 students produces a mean of 26.4 minutes with σ = 6 minutes.
Using a 5% significance level, perform the hypothesis test and state whether the gym’s claim should be rejected.

Work: 

SE = σ/√n = 6/6 = 1. 

z = (26.4 − 28) / 1 = −1.6.

Critical Value z = -1.96. 

-1.6 < -1.96 → This is not true → Do not reject H₀.

p-value ≈ 0.118 < 0.05 → This is not true → Do not reject H₀.

400

A regression model was used to predict exam score from hours studied, producing the equation:
ŷ = 62 + 4.5x

A student who studied 6 hours scored 88.
What is the residual, and what does it tell you about the model’s prediction for this student?

Work: Predicted ŷ = 62 + 4.5(6) = 62 + 27 = 89. Residual = observed − predicted = 88 − 89 = −1.
Interpretation: The student scored 1 point below what the model predicted.

500

Two dorms collected data on shower water usage (in gallons) for a random sample of residents:

  • Dorm A: 18, 19, 20, 22, 25, 27

  • Dorm B: 11, 12, 14, 19, 20, 40

Describe how the two dorms differ in center, spread, and shape, and identify an outlier if one exists.

  • Center: Dorm A mean = 131/6 ≈ 21.83; Dorm B mean = 116/6 ≈ 19.33. Dorm A has a slightly higher center.

  • Spread / Shape: Dorm A range = 9 (18–27), fairly symmetric; Dorm B range = 29 (11–40) and is heavily right-skewed due to the 40.

  • Outlier: 40 in Dorm B is an outlier (much larger than other values).
    Interpretation: Dorm B has similar center but much larger variability and an outlier; Dorm A is more tightly clustered.

500

Daily steps among students follow a highly skewed distribution with μ = 5,500 and σ = 1,900.
You take a sample of n = 42 students.
Is the sampling distribution of the sample mean approximately normal? Explain clearly why or why not, using CLT conditions.

Yes — the sampling distribution of the sample mean is approximately normal. 

Reason: n = 42 is reasonably large (commonly n ≥ 30), so by the Central Limit Theorem the distribution of the sample mean will be close to normal even though the original daily-steps distribution is skewed. (If extreme skew or heavy outliers existed, you’d be slightly more cautious, but n = 42 is adequate.)

500

A city planner wants the average commuting time (minutes) for Cityville residents estimated within +2 minutes. A pilot study suggests sigma =12 minutes. Use a 95% confidence level. 

What sample size (n) is required?

Work: 

z for 95% = 1.96 

E = 2 

sigma = 12
n = (z*²· sigma²) / E² = (1.96² · 12²) / 2² = 138.29 → 139 (round up).
Answer: 139 respondents.

500

A coach compares free-throw shooting between two players over the season.  Player 1 (sample of 50 attempts) has an average success rate of 0.780. 

Player 2 (sample of 55 attempts) has an average of 0.740.  Historical population standard deviations are 0.060 for Player 1 and 0.050 for Player 2. 

At the 5% significance level, is there evidence that Player 1’s true free-throw percentage is greater than Player 2’s?

Work: 

Zobs = 3.692

CV: -norm.s.inv(0.05) -> 1.65

Z> CV -> Reject H0

PV: 1-norm.s.dist(3.692,true) -> 0.0001

PV < 0.05 -> Reject Ho

500

Campus Dining tests whether music type (pop, classical, no music) and meal price (low, medium) affect the number of lunch orders.
They run a factorial experiment and find a significant interaction between music type and price.

Explain what a significant interaction means in this context, and give one example of what the interaction might look like.

Answer (short, plain language): A significant interaction means the effect of music type depends on the price level — i.e., the difference in orders between music types is not constant across prices.
Example: At low price, pop music might increase lunch orders by 30 compared to no music, but at high price pop might increase orders by only 5 (or maybe even decrease them). Thus the benefit (or harm) of a music type changes with price; you cannot interpret the main effects alone without considering the interaction.