Two samples have the same mean but very different boxplots. What does this show?
Mean alone may be misleading; variability differs
Which is more sensitive to extreme values: variance or IQR?
Variance.
What does a steady upward trend in a time plot suggest?
Improving performance or growth
If r = 0, what does that mean?"
No linear relationship (but nonlinear may exist).
Why are plots essential before statistical modeling?
They reveal patterns and anomalies quickly
A dataset has extreme outliers. Which measure of center is more robust?
Median
Why might range be unreliable?
It only considers two values: min and max.
Why are seasonal cycles important in time data?
They reveal periodic patterns affecting interpretation.
Two variables have r = 0.9 but scatterplot shows a curve. What’s wrong?
Correlation only measures linear association.
Why does variability matter in engineering?
High variability means less predictable, less reliable systems
If two datasets share the same mean but one is skewed, what does this imply?
Distribution shape matters; mean may not represent the data well
Which is more stable across samples: standard deviation or range?
Standard deviation.
A sudden level shift in a time series suggests …
A structural change in the process.
Why is ‘correlation ≠ causation’ important?
High correlation may be spurious; not evidence of cause-effect.
Why compare multiple displays (histogram, boxplot, etc.)?
Each emphasizes different features of the data.
Why might the median be better than the mean for income data?
Income data are often skewed; the median resists outliers
What makes IQR useful in skewed data?
It captures central spread without being affected by extremes
Combining a stem-and-leaf with time sequence gives what display?
A digidot plot.
What does a nearly perfect correlation (r ≈ 1) between color and density in wine data imply?
The two variables are almost redundant.
How can descriptive statistics guide material selection?
They summarize strength, variability, and reliability
What is the danger of summarizing data with only the mean?
It hides spread, skewness, and outliers.
If s² consistently underestimates variability, what adjustment is made?
Divide by n-1 (degrees of freedom).
Why plot data in time order before summary stats?
To detect shifts, cycles, or trends missed by summaries
Correlation of –0.7 suggests what?
Strong negative linear relationship
Why is descriptive analysis always the first step in data science?
It helps understand data structure before advanced modeling.