This metric tells you the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.
What is a p-value?
This concept is the probability that your test will correctly detect a true effect when there actually is one.
What is statistical power?
If you allocate traffic 50/50 but end up with 45,000 users in Control and 55,000 in Treatment, your test is likely suffering from this data-invalidating phenomenon.
What is Sample Ratio Mismatch?
This statistical test is used to determine if there is a statistically significant difference somewhere among the means of 3 or more groups.
What is ANOVA (Analysis of Variance)?
This is the statistical strategy of using historical data to shrink the noise in your current experiment.
What is Controlled Experiments using Pre-Experiment Data (CUPED)?
This fundamental experimental process ensures that any confounding variables (like user demographics or time of day) are distributed equally between your control and treatment groups.
What is randomization?
In power analysis, this is the smallest lift or change in a metric that you care about practically and want to ensure your test can detect.
What is Minimum Detectable Effect?
This occurs when you run multiple variants simultaneously (e.g., Change A and Change B) and the effect of Change A depends heavily on whether the user also experienced Change B.
What is an interaction effect?
This is the simplest, most conservative method used to control the Family-Wise Error Rate by dividing your target alpha by the number of comparisons being made (alpha/k).
What is the Bonferroni correction?
The primary statistical goal of implementing CUPED in an A/B testing framework is to reduce this mathematical property of your metric, allowing you to reach significance faster.
What is variance?
If your A/B test has a confidence interval for the treatment effect that ranges from -2% to +5%, this is the conclusion you must draw about the statistical significance of your result.
What is "not statistically significant"?
If your A/B test has a statistical power of 80%, this is the exact mathematical probability (expressed as a percentage) that you will commit a Type II error (false negative) if a true effect exists.
What is 20%?
You launch a feature that only loads for users on fast 5G networks, while the control group includes everyone. By comparing the groups, you mistakenly conclude your feature drastically increased engagement. This is because your experiment suffers from this specific flaw.
What is selection bias?
To detect an SRM, data scientists usually run this specific statistical test on the observed vs. expected sample counts.
What is a Chi-square goodness-of-fit test?
For CUPED to successfully reduce variance, the pre-experiment metric and the in-experiment metric must have a strong, non-zero amount of this statistical relationship.
What is correlation (or covariance)?
This is the specific statistical test you would use if you want to compare the conversion rates (a categorical, binary metric) between a Control group and a Treatment group.
What is a Chi-square test (of independence)?
If you decide you want to detect a smaller effect than originally planned while keeping your power and alpha the same, your required sample size must change in this direction.
What is increase?
If you repeatedly peak at your A/B test data and stop it early the moment it hits p < 0.05, this type of error rate will be much higher than your nominal alpha.
What is the Type I error (false positive) rate?
The Benjamini-Hochberg procedure controls this among all significant results, making it less conservative and more powerful for large numbers of variants.
What is the False Discovery Rate (FDR)?
If an engineer accidentally includes data from after the user was exposed to the treatment variant when calculating the CUPED baseline, it will introduce this issue and can completely wipe out or distort the actual treatment effect.
What is the post-treatment bias?