Unit 1
Unit 2
Unit 3
Unit 4
Unit 5
100

What is the difference between categorical data and quantitative data?

Categorical is groups or labels, quantitative is numerical or amounts.

100

What are some tools to determine a relationship between categorical data?

Bar graphs, two-way tables, conditional relative frequencies

100

What is a sample

A grpup of people selected from the whole observed population

100

What is a sample space?

The collection of all possible outcomes.

100

What is a sampling distribution?

The collection of all possible samples from a population

200

How can we tell which way data is skewing?

The side with the slope that looks like something a skier would go down is how it skews. EX: _|\_ skews right

200

Explain the how the r-value visually connects to a scatterplot

when r is closer to 1, it has a stronger positive correlation. The closer to 0 it becomes a weaker correlation. And the closer to -1, it becomes a stronger negative correlation.

200

What are all the sampling types?

Census - measuring the entire population. Random sample - choosing a group of people at random. SRS - using a random number generator to create the sample size. CRS - Divide population into clusters and randomly select some clusters and use the data within the cluster. Stratified RS - Taking clusters but clusters with similar data and taking an SRS from every cluster. Systematic RS - Choosing a random starting place and moving through the data with a fixed interval. 

200

What makes an event mutually exclusive or not?

Mutually exclusive events can never occur at the same time, when they can happen at the same time then they are not.

200

How do we know if an estimator is biased or not?

An estimator is unbiased if it is typically equal to the population parameter

300

What is standard deviation?

Standard deviation is on average how much the data varies from the mean.

300

What is a residual?

The actual response value - the predicted response value

300

Why are there so many sample types?

All data is different and must be differently measured, so a variety allows for people to choose the exact perfect sampling type.

300

What makes an event independent?

If the previous trials have no effect on the future trials and each trial has the same probability.

300

How do you interpret the mean?

For all random samples of size ___, from the (POPULATION), the mean of the distribution of all possible samples of (WHAT IS BEING MEASURED) will have a value of (MEAN)

400

What is IQR

IQR is the interquartile range, the range between Q1 and Q3. Q1 is the median of the first half of the data and Q3 is the median of the second half of the data.

400

How is the least squares regression model used?

To find the best line on a scatterplot based on squared residuals. 

400

What is a confounding variable 

A third variable that isn't accounted for that affects the outcome of the data which could create a false idea of a correlation.

400

What is a binomial distribution?

The amount of successes after a set number of trials, with each trial being independent.

400

How do you interpret the standard deviation?

For all random samples of size ___, from the (POPULATION), the sample mean of the distribution of all possible samples of (WHAT IS BEING MEASURED) will typically vary by about (STANARD DEVIATION) from the population mean of (MEAN).

500

Explain Z-score

Z score is how much each data point is, in terms of standard deviation, from the mean. It's also called the standard score or standardized score. It can be found through the formula: Z = (X-mean)/std dev

500

What are high leverage points and outliers?

High leverage points have a very high or low X value and can change the slope. Outliers have a significant residual and can change the strength.
500

Why is a randomized experiment good?

It can help cut out confounding variables.

500

What is a geometric distribution

The amount of trials it takes to get a success with each trial being independent.

500

What is the central limit theorum?

When the sample of a population is large enough, the distribution of the sample is approximately normal. This works for all distributions of populations.