One-Variable Data
Distributions
Two-variable Data
Sampling and Exp Design
Probability
100

This is what you get when you sum all values in a data set and divide by the number of values in that data set.

What is the mean?

100

This type of distribution looks the same on either side of the center, and the mean is approximately equal to the median.

What is symmetric?

100

This type of data visualization creates a point for each (x,y) pair of values.

What is a scatterplot?

100

This is the most basic sampling technique, where all possible individuals have the same chance of being selected.

What is simple random sampling?

100

This is the sum of all probabilities in a scenario.

What is 1? (or what is 100%?)

200

This value in a dataset has half of the values above it and half of the values below it.

What is the median?

200

This type of distribution has high outliers, leading to a tail on the right side of the distribution curve.

What is skewed right?

200

We use this word to describe how two variables are related. It may be strong or weak; it may be negative or positive.

What is correlation?

200

This is a type of study which does not involve controlling any variables.

What is an observational study?

200

Every probability should be between these two numbers.

What are 0 and 1?

300

This diagram shows the relative frequencies or counts along a y-axis of categorical values in a data set. 

What is a bar chart?

300

95% of the data in a distribution fall between this many standard deviations of the mean.

What is two?

300

This is a linear equation computed to represent the best fit between the two variables. It can be used to predict future data points.

What is the least squares regression line?

300

We implement controls in a study to minimize the effect of these.

What are confounding variables?

300

This is when two events have no overlap.

What is mutually exclusive?

400

This is a measure of spread based on minimizing the squares of the distances from the mean. It is a common way to describe the variation in a set of values.

What is the standard deviation?

400

This descriptor tells you what percentage of the data is below a specific value. 

What is a percentile?

400

This is the error (vertical distance) between a linear model's prediction and the actual data point.

What is the residual?

400

If it's not minimized through randomization, the sample may not share the same characteristics as the population.

What is bias?

400

This is the average outcome over many trials.

What is expected value?

500

This diagram clearly displays the four quartiles of a distribution.

What is a box plot?
500

This descriptor tells you how many standard deviations above the mean a particular value is.

What is a z-score?

500

This is a mantra that should be one of your biggest takeaways from this class.

What is "Correlation ≠ Causation" ?

500

This experimental design ensures that no one knows who got which treatment.

What is a double-blind study?

500

This is a diagram that shows possible outcome for successive events.

What is a tree diagram?