Exploring Data
Normal Distribution
Regression
Sampling and Experiments
100

In this shape distribution the mean is greater than the median.

skewed right

100

This is the formula you use to find a z-score.

Z = (value – mean) / sd

100

This is how you compute a residual.

Residual = Actual - predicted

100

This type of sampling divides the population into groups of individuals and then chooses separate SRSs from each group.

Stratified Random Sampling

200

When looking at two plots with the same scale, how can you tell which one has a smaller standard deviation?

Less variation so less spread out

200

This is the correct interpretation of z = - 1.5

1.5 standard deviations below the mean

200

This is how you can tell if a data point is influential.

It changes r and the slope/y-int of the linear model.

200

This type of sampling divides the population into groups and then randomly chooses entire groups to sample.

cluster sampling

300

This rule is used to identify an outlier in a boxplot.

1.5 * IQR added to Q3 and subtracted from Q1

300

The scores on the SAT math section are normal distributed with a mean of 500 and a standard deviation of 100. what proportion of all test takers have a score of less than 650?

Normalcdf(-10000, 650, 500, 100) = .9332

300

What does correlation measure?

strength and direction of a linear relationship

300

These are the 4 major components of experimental design.

Comparison, Control, Randomization, Replication

400

When using a linear transformation (multiplying by a factor/adding a constant), these two rules tell how the center and spread change.

Multiplying – both, adding – only center

400

The scores on the SAT math scores are normal distributed with a mean of 500 and a standard deviation of 100. What score would you need to be in the top 10% of all test-takers?

invNorm(.9,500,100) = 628.155 so 630

400

This is when you use a regression model for prediction outside the range of values of the explanatory variable x  used to obtain the line.

extrapolation

400

A sample is said to be this if it systematically favors or excludes part of the population. 

biased

500

When comparing two distributions You should always mention these four things.

SOCV – Shape, Outliers, Center, and Variation

500

The scores on the SAT math section are normal distributed with a mean of 500 and a standard deviation of 100. In what range will the middle 95% of all test-takers fall?

500+/- 200 = 300-700

500

When assessing if data is linear, what should you see on a residual plot?

no pattern - scattered plot

500

This is the main difference between an experiment and an observational study. 

An experiment imposes a treatment

M
e
n
u