Exploring Data
Sampling & Experimentation
Probability & Random Variables
Inference
Bivariate Data
100

This acronym stands for Shape, Outliers, Center, and Spread—the four essentials for describing a distribution.

What is SOCS?

100

This type of study observes individuals and measures variables of interest but does not attempt to influence the responses.

What is an Observational Study?

100

If two events have no outcomes in common and cannot occur simultaneously, they are said to be this.

What is Mutually Exclusive (or Disjoint)?

100

This value, denoted by α, is the threshold for rejecting the null hypothesis.

What is the Significance Level?

100

This value, r, measures the strength and direction of a linear relationship between two variables.

What is the Correlation Coefficient?

200

In a distribution that is heavily skewed to the right, this measure of center will typically be larger than the median.

What is the mean?

200

This occurs when some groups in the population are left out of the process of choosing the sample.

What is Undercoverage?

200

This is the probability that an event occurs given that another event has already occurred.

What is Conditional Probability?

200

This type of error occurs when we fail to reject the null hypothesis, but the null hypothesis is actually false.

What is a Type II Error?

200

This is the vertical distance between an observed value of y and the value predicted by the regression line.

What is a Residual?

300

This rule states that in a Normal distribution, approximately 68%, 95%, and 99.7% of data fall within 1, 2, and 3 standard deviations of the mean.

What is the Empirical Rule (or the 68-95-99.7 Rule)?

300

In an experiment, this is a variable that is not among the explanatory or response variables but may influence the response variable.


What is a Confounding Variable?

300

For a Binomial distribution to be valid, the trials must be independent, there must be a fixed number of trials, and the probability of "success" must be this.

What is Constant (or the same for each trial)?

300

This is the probability, computed assuming H0 is true, that the statistic would take a value as extreme as or more extreme than the one observed.

What is a P-value?

300

This coefficient, r2, tells us the fraction of the variation in y explained by the least-squares regression on x.

What is the Coefficient of Determination?

400

This value is calculated by subtracting the first quartile (Q1) from the third quartile (Q3).

What is the IQR (Interquartile Range)?

400

This is the practice of using "fake" treatments to separate the effects of the actual treatment from the subjects' expectations.

What is placebo?

400

This theorem states that as the sample size n increases, the sampling distribution of the sample mean becomes approximately Normal.

What is the Central Limit Theorem?

400

When the population standard deviation (σ) is unknown, we use this distribution instead of the z-distribution.

What is the t-distribution?

400

If a scatterplot shows a curved pattern, you might perform this mathematical operation on the data to "straighten" it.

What is a Logarithm (or Power Transformation)?

500

A data point is considered an outlier if it is more than 1.5 times the IQR above this specific percentile.

What is the Third Quartile (or Q3 / 75th percentile)?

500

To reduce the variation within treatment groups, researchers use this technique to group subjects with similar characteristics before randomization.

What is Blocking?

500

To calculate the standard deviation of the difference between two independent random variables, you must add these values before taking the square root.

What are Variances?

500

The "Power" of a test is defined as 1 minus the probability of this specific type of error.

What is a Type II Error?

500

A point is considered to have high "this" if its x-value is far from the mean of the other x-values.

What is Leverage?

M
e
n
u