Exploring Data (C1)
Normal Curves (C2)
Regression (C3)
Designing Studies (C4)
Probability (C5&6)
100

What type of data is it when the values are group labels which often don't have an inherent order? 

Categorical Data

100

What is the percentile of a distribution?

nth percentile is the value with n percent of the observations or less than it.

100

What is the acronym for describing a scatterplot and what do the letters stand for? 

CDOFS: Context, Direction, Outliers, Form, Strength

100

How do you take a stratified random sample? What does this help you avoid? 

Start by classifying the population into groups of similar individuals, called strata. Then choose a separate SRS in each stratum and combine these SRSs to form the entire sample. 

Helps avoid bias!! If you get biased data, it is useless. 

100

When are events mutually exclusive? Can mutually exclusive events be independent? 

When there are no outcomes in common and no, they cannot be independent. 

200

Name 1 of the graphs that can be used for categorical data and 2 of the graphs that can be used for quantitative data.

Categorical: Bar graphs or pie charts

Quantitative: Histogram, Dot plot, Box plot, Stem plot

200

How do you calculate a z-score? 

z=(x-mean)/standard deviation

200

How would you interpret r=-0.9?

The scatterplot has a strong negative correlation.

200

What is the difference between an observational study and an experiment? 

An observational study is just observing and collecting data, not influencing the responses. An experiment deliberately imposes a treatment on individuals to measure their responses. 

200

The Senate in a recent year was made up of 47 male democrats, 13 female democrats, 36 male republicans, and 4 female republicans. Find the probability of choosing a democrat given they are female. 

P (D|F) = 13/17 = 0.7647

300

What is the acronym for describing a quantitative graph and what do the letters stand for? 

CSOCS: Context, Shape, Outliers, Center, Spread

300

Keith measures the diameter of each tennis ball in a bag with a standard ruler. Unfortunately, he used the ruler incorrectly and each of his measurements was 0.2in too large. Keith's data had a mean of 3.2in and a standard deviation of 0.1in. What is the mean and standard deviation of the correct measurements in cm (1 in = 2.54 cm)?

Mean = (3.2-0.2)*2.54 = 7.62cm

Std. Deviation = 0.1*2.54 = 0.254cm

300

For many, the women's figure skating competition is the highlight of the Winter Olympics. Scores in the short program, x, and score in the long program, y, were recorded for each of the 24 skaters. The equation of the regression line is y=-16.2+2.07x. Interpret the slope AND y-intercept.

Slope: For every one-point increase in score of the short program, we expect the score of the long program to increase by 2.07 points.

Y-intercept: When a score of 0 is given for the short program, we expect the score of the long program to be -16.2 points. This doesn't make much sense due to extrapolation. 

300

What does it mean for an experiment to be single-blind vs. double-blind? 

In a single-blind experiment, either the subjects or the researcher would be unaware of the treatment plan. 

In a double-blind experiment, neither the subjects nor those who interact with them and measure the response variable would know which treatment a subject received. 

300

Shannon hits the snooze button on her alarm clock on 60% of school days. If she doesn't hit snooze, there is a 0.90 probability that she will make it to school on time. However, if she hits snooze, there is a 0.70 probability that she will make it to school on time. What is the probability she will be late to school on a randomly chosen school day? 

P(Late) = 0.60*0.30 + 0.40*0.10 = 0.22

400

What measures of center and spread are best for skewed data vs. symmetric data? 

Median and IQR for skewed. 

Mean and standard deviation for symmetric.

400

The length of human pregnancies from conception to birth varies according to a distribution that is approximately Normal with a mean of 266 days and a standard deviation of 16 days. What percent of pregnancies last between 240 days and 270 days (roughly 8-9 months)?

P (240<X<270) = normalcdf(lower: 240, upper: 270, mean: 266, std. dev: 16) = 0.5466

400

For many, the women's figure skating competition is the highlight of the Winter Olympics. Scores in the short program, x, and score in the long program, y, were recorded for each of the 24 skaters. The equation of the regression line is y=-16.2+2.07x. What is the residual for the gold medal winner, Yu-Na Kim, who scored 78.50 in the short program and 150.06 in the long program?

Residual = 150.06 - (-16.2+2.07(78.5))=3.765

Yu-Na Kim's long program score was 3.765 points higher than expected based on her short program score. 

400

What is blocking in an experiment? 

A block is a group of experimental units that are known before the experiment to be similar in some way and is expected to affect the response to the treatments.

400

According to New Jersey Transit, the 8:00am weekday train from Princeton to New York City has a 90% chance of arriving on time on a randomly selected day. Suppose this claim is true. Let W = the number of days on which the train arrives late. Find the probability that the train arrives late exactly 2 out of 6 days.

P(W=2) = binompdf(trials: 6, p: 0.1, x: 2) = 0.0984

500

The 2011 Dallas Cowboys roster had 11 lineman. Their weights (in pounds) are: 

310, 307, 345, 324, 305, 301, 290, 307

Are there any outliers? 

Anything above Q3+(1.5*IQR) is a high outlier. Anything below Q1-(1.5*IQR) is a low outlier. 

345 lbs is an outlier.

500

The length of human pregnancies from conception to birth varies according to a distribution that is approximately Normal with a mean of 266 days and a standard deviation of 16 days. How long do the longest 20% of pregnancies last?

invNorm(area: 0.20, mean: 266, std. dev: 16, tail: right) = 279.44 days

or  invNorm(area: 0.80, mean: 266, std. dev: 16, tail: left) = 279.44 days

500

For many, the women's figure skating competition is the highlight of the Winter Olympics. Scores in the short program, x, and score in the long program, y, were recorded for each of the 24 skaters. The equation of the regression line is y=-16.2+2.07x. If r^2 = 0.736, interpret that value.  

About 73.6% of the variation in long program scores is accounted for by the linear model relating free skate scores to short program scores. 

500

When can you make an inference about the population AND an inference about cause and effect? 

When the individuals were randomly selected and the individuals were randomly assigned to groups. 

500

According to New Jersey Transit, the 8:00am weekday train from Princeton to New York City has a 90% chance of arriving on time on a randomly selected day. Suppose this claim is true. Let W = the number of days on which the train arrives late. Find the probability that the train arrives late on 2 or more of 6 days.

P(W>=2) = 1 - binomcdf(trials: 6, p: 0.1, x: 1) = 0.1143