Spot the Overclaim
What Can We Conclude?
FIX THE CLAIM (REWRITE IT)
WHY THIS HAPPENS
Challenge Round
100

A study finds that a training program improves performance in Group A (p < .05), but not in Group B (p > .05).
The authors conclude: “The training is only effective for Group A.”

This is an overclaim. The authors are comparing significance labels rather than testing whether Group A and Group B actually differ. According to Gelman & Stern, a significant result in one group and not in another does not imply a significant difference between groups.

100

A treatment shows a statistically significant effect in one sample.

We can conclude there is evidence for an effect in that sample, but not that the effect is large, important, or generalizable. Statistical significance does not imply practical importance.

100

The program works for adults but not for children.

The program showed evidence of an effect in adults but not in children; however, we cannot conclude the effects differ without a direct comparison.

100

Why do people interpret “significant vs not significant” as meaning “different”?

Because significance is treated as a binary decision, which simplifies interpretation. The paper argues this leads to incorrect comparisons.

100

Original study: significant effect
Replication study: non-significant effect

Conclusion:
 “The replication failed.”

This is misleading. A non-significant result in the replication does not necessarily contradict the original finding. The two studies could have similar effect estimates. According to the paper, replication should be evaluated by comparing effects, not significance labels.

200

Two predictors are tested in a regression. Predictor X is significant, Predictor Y is not.

Conclusion: “Predictor X is more important than Predictor Y.”

This is an overclaim because significance does not measure importance. The correct comparison is whether X and Y differ in effect, not whether one crosses a significance threshold and the other does not. The paper warns against comparing significance instead of estimates.

200

One condition shows a clear effect; another shows a weaker, non-significant effect.

We can conclude that evidence differs across conditions, but we cannot conclude the conditions are different without directly comparing them.

200

This factor is the only one that matters.

This factor showed stronger evidence of an effect than others, but we cannot conclude it is uniquely important without comparing effects directly.

200

Why is statistical significance misleading as a comparison tool?

Because it depends on uncertainty and thresholds, not just effect size. Two similar effects can have different significance levels.

200

A researcher reports:

  • Group A shows a strong, statistically significant effect
  • Group B shows a moderate but non-significant effect

Conclusion:
 “The effect is stronger in Group A than Group B.”


This is an overclaim. The researcher is comparing significance levels instead of directly comparing the effects. The difference between “significant” and “not significant” does not imply that the effects differ. The correct approach would be to test whether the effect in Group A is significantly different from the effect in Group B.

300


Study 1 finds a significant positive effect. Study 2 finds a similar-sized but non-significant effect.

Conclusion: “The findings are inconsistent.”


This is misleading. The studies may actually be consistent if their estimates are similar. The difference between significant and non-significant results is not itself significant. The apparent inconsistency may just reflect uncertainty, not a true difference.

300

Two studies produce different p-values for similar effects.

Different p-values do not necessarily indicate different effects. The results may still be consistent. The paper emphasizes that comparing significance levels is not a valid way to compare effects.

300

The effect disappears under condition B

The effect was not statistically significant under Condition B, but this does not necessarily mean the effect differs from Condition A.

300

Why does sample size matter for significance?

Larger samples reduce uncertainty, making effects more likely to be significant. This means significance differences can reflect sample size rather than real differences.

300

A study finds:

  • Social media use predicts anxiety in teens (significant)
  • No significant relationship in adults

Conclusion:
 “Social media harms teens but not adults.”

This is an overclaim because it assumes a difference between teens and adults without testing it. The difference in significance does not imply a difference in effect. A proper conclusion would require directly comparing the effect sizes across groups.

400

A researcher reports that a treatment had a significant effect at Time 1 but not at Time 2, concluding: “The effect disappears over time.”

This is an overinterpretation. The researcher has not tested whether the effect at Time 1 is significantly different from the effect at Time 2. The change in significance does not imply a significant change in the effect.

400

A variable is significant in a large sample but not in a small sample.

This likely reflects differences in sample size and uncertainty, not necessarily differences in the effect itself. Statistical significance depends on sample size, so we should not overinterpret this difference.

400

The treatment is ineffective in this population.

The treatment did not show statistically significant evidence in this population, but this does not imply it is ineffective.

400

Why is it dangerous to summarize results as “significant” vs “not significant”?

Because it hides the actual magnitude and uncertainty of effects and encourages overconfident conclusions.

400

A study finds:

  • Significant effect before intervention
  • Non-significant effect after intervention

Conclusion:
 “The effect disappeared after the intervention.”

This is an overinterpretation. The change in significance does not imply a real change in the effect. The study has not shown that the before and after effects are significantly different. The apparent “disappearance” could be due to variability or uncertainty.

500

A paper reports that an intervention worked in one country but not another, concluding that cultural differences explain the effect.

This is a strong overclaim. The conclusion assumes a difference across countries without directly testing it. According to Gelman & Stern, differences in statistical significance across groups do not imply meaningful differences between them.

500

A result is not statistically significant but shows a relatively large effect estimate.

We cannot conclude there is no effect. The result may be imprecise. As the paper notes, lack of significance does not imply lack of importance or absence of an effect.

500

This study failed to replicate previous findings.

The study did not find statistically significant evidence of the effect, but replication should be evaluated by comparing effect estimates, not significance alone.

500

What is the core statistical mistake highlighted by the paper?

Treating the difference between significant and non-significant results as meaningful, instead of directly comparing the underlying effects.

500

A regression includes 4 predictors:

  • Predictor A → significant
  • Predictors B, C, D → not significant

Conclusion:
 “Only Predictor A matters.

This is a classic overclaim. The researcher is concluding importance based on significance alone. The correct question is whether Predictor A’s effect is significantly different from B, C, and D. The paper warns that comparing significant vs. non-significant predictors is not valid evidence of differences.

M
e
n
u