What is the difference between a histogram and a boxplot? When should each be used?
Histograms show distribution shape and good for large data; boxplots show spread and outliers and good for comparison
Define the terms "shape", "center", and "spread" in context of a distribution.
Shape: symmetry/skew, Center: mean/median, Spread: range/IQR/std. dev.
Define voluntary response bias and give an example.
Bias from people choosing to respond (e.g., online polls).
What is the probability of getting heads on a fair coin flip?
0.5
Find the median and IQR: 5, 8, 9, 13, 15, 17, 21
Median = 13; Q1 = 8.5, Q3 = 16 → IQR = 16 - 8.5 = 7.5
What summary stats are resistant to outliers? Why?
Median and IQR are resistant; they ignore extreme values.
What is undercoverage?
When part of the population is left out of the sampling process.
What’s the probability of drawing an ace from a standard deck of 52 cards?
4/52 = 1/13 ≈ 0.077
A dataset has a mean of 10 and standard deviation of 2. If 4 is added to each value, what happens to the mean and standard deviation?
Mean becomes 14, standard deviation stays 2.
A symmetric distribution has a mean of 20 and standard deviation of 3. What is the expected range for most values?
Between 14 and 26 (mean ± 2 std devs).
Describe one method to reduce sampling bias.
Use random sampling techniques like SRS or stratified sampling.
If P(A) = 0.3, P(B) = 0.4, and A and B are independent, find P(A and B).
0.3 × 0.4 = 0.12
Describe how to identify outliers using the 1.5 × IQR rule.
Outliers are below Q1 − 1.5×IQR or above Q3 + 1.5×IQR.
Compare two distributions: one is skewed left, one symmetric. How do center and spread compare?
Skewed left → median > mean; symmetric → mean ≈ median.
Why might convenience sampling lead to unreliable results?
It often overrepresents accessible individuals and underrepresents others.
Two dice are rolled. What’s the probability the sum is 7?
6 outcomes: (1,6), (2,5), ... → 6/36 = 1/6
A stem plot shows a skewed-right distribution. What does this tell you about the relationship between mean and median?
Mean > Median in skewed-right distributions.
A dataset is skewed right with outliers. Should you use mean or median to summarize center? Justify.
Median, because it is more resistant to the right-skewed outliers.
A survey samples every 10th person on a list. What type of sampling is this? Pros and cons?
Systematic sampling. Efficient but may miss patterns in list ordering.
You flip a coin 3 times. What is the probability of getting exactly 2 heads?
C(3,2) × (0.5)^3 = 3 × 0.125 = 0.375