Data Analysis
Confidence Intervals
Linear Regressions
Pdfs & Cdfs
Random!
100
Emely is analyzing data collected from an observational study. She finds that the correlation between student height and student foot size is 0.89. Does this mean that a student's height causes their feet to be bigger?

Nope. It does seem that taller students have bigger feet but correlation NEVER implies causation. You must conduct an experiment to determine causation. 

100

I have the option of picking a 95% confidence interval or a 99% confidence interval. Which confidence interval has a higher margin of error?

99%. Margin of error is always equal to the z-score times the standard deviation, and the z-score increases as you increase the confidence interval.
100
The equation for a linear regression is y = a + Bx.

What is X? What is Y? HINT: Think ___ variable.

X is the explanatory variable and Y is the response (dependent) variable. 


100

Ms. Rosenbaum advertises that 85% of her flower seeds will germinate (grow). Suppose that the company's claim is true. Armani buys a packet with 20 flower seeds from Ms. Rosenbaum and plants them in his garden. Let X=the number of seeds that germinate. This is known as a ________ distribution.

Binomial distribution.
100

Each person in a simple random sample of 2,000 received a survey, and 317 people returned their survey. What is the name of this bias?

Non-response.
200

Suppose the correlation between two variables is r = .28. 


What will the new correlation be if .17 is added to all values of the x-variable?

r = 0.28. Correlation isn't changed when you add a constant to every value of a variable.

200

Our 95% Confidence Interval for Delice's population mean was (230.80,250.80). How do we interpret this interval?

We are 95% confident that the population mean falls within the interval from 230.80 to 250.80.
200
What is a residual?
The residual is the distance from the actual point to the predicted point (i.e. from y to y-hat). 
200

Exactly 10% of the students at Bronx Latin are left handed. Select students at random from the school, one at a time, until you find one who is left-handed. Let V=the number of students chosen. This is known as a _______ distribution.

Geometric distribution.
200
What is a critical difference between experimental and observational studies?
Experimental studies have a treatment that is performed on the subjects. 
300

If ten executives have salaries of $80,000, six have salaries of $75,000, and three have salaries of $70,000, what is the median salary?

Median salary is $80,000. 
300
When constructing the confidence interval for means instead of proportions, what additional information do we need to take into account?
Degrees of freedom, or sample size - 1.
300
Chris J has plotted a bunch of points on a scatterplot. His correlation coefficient r = -0.3. The residual plot shows a curved shape. How would you describe this regression? 
This regression is weak, negative, and non-linear.
300

Jessica is volunteering for opinion poll and calls residential telephone numbers at random. Only 20% of the calls reach a live person. You watch the random digit dialing machine make 15 calls. Let X=the number of calls that reach a live person. Find and interpret the mean of X.

The mean of a random variable is E(x) = np. 

When making 15 random phone calls, we expect 15*0.2 = 3 people to pick up on average.

300
I take a group of 500 subjects for an experiment and group them by age before randomly assigning subjects to treatment and control. What is this technique called?
Blocking.


Note: Stratification is used in observational studies to the same effect. 

400

Consider a data set of positive values, at least two of which are not equal. Which of the following sample statistics will be changed when each value in this data set is multiplied by a constant whose absolute value is greater than 1?

I. The mean

II. The median

III. The standard deviation

All three. 
400

What is the formula for calculating a confidence interval for a population mean?

x +/- t * sigma/sq. root of n.
400

Exercise physiologists are investigating the relationship between lean body mass (in kilograms) and the resting metabolic rate (in calories per day) in sedentary males. They find:

             Coef        StDev         T              P

constant 264.0       276.9       0.95         0.363

Mass      22.563      6.360       3.55          0.005

S = 144.9      R-sq = 55.7%     R-sq (adj) = 51.3%

What is the appropriate interpretation for the value of the slope of the regression line?

For each additional kilogram of lean body mass, the resting metabolic rate increases by 22.563 calories per day.
400

To start her old snow blower, Ms. Nelson has to pull a cord and hope for some luck. On any particular pull, the mower has 20% chance of starting. What is the probability of it starting in exactly three pulls?

geometpdf (probability: 0.2, trials : 3) = 0.128.


Alternatively:

p (X = 3) = (0.2)(0.8)^2

400

In a certain game, a fair die is rolled and a player gains 20 points if the die shows a “6.” If the die does not show a “6,” the player loses 3 points. If the die were to be rolled 100 times, what would be the expected total gain or loss for the player?

For one roll, E(X) = (1/6) * 20 + (5/6) * -3 = 0.83.

For 100 rolls, E(X) = 0.83*100 = 83.

500

Using the most commonly accepted definition of outliers, a set has five outliers. If every value of the set is increased by 20 percent, how many outliers will there now be?

Increasing every value of by 20% increases Q1, Q3 and the IQR. An outlier is outside the data of Q1 - 1.5IQR and Q3 + 1.5IQR, so there will still be 5 outliers.
500

A large company is considering opening a franchise in St. Louis and wants to estimate the mean household income for the area using a simple random sample of households. Based on information from a pilot study, the company assumes that the standard deviation of household incomes is σ = $7,200. What is the least number of households that should be surveyed to obtain an estimate that is within $200 of the true mean household income with 95 percent confidence?

HINT: When estimating sample sizes for means, use a z-score instead of a t-score because we don't have the degrees of freedom.

At least 4976 households. 

Solve the equation for n:

z * sigma / sq. root (n) <= ME

Using our calculators, we get z = 1.96, sigma = $7,200 and ME = $200

(1.96) * $7,200 /sq. root (n) <= $200

$7,200 / sq. root (n) <= $102

$7,200 <= $102 * sq. root (n)

70.5 <= sq. root (n)

n >= 4976

500
I have a bunch of numbers. The mean of the x-values is 5 and the standard deviation is 10. The mean of the y-values is 10 and the standard deviation is 4. 

What could be the least squares regression line?

A. y = -5.0 + 3.0x

B. y = 3.0x

C. y = 8.5 + 0.3x

C. Plug in 5 for x, you must get 10 for y. Then, calculate the slope using the formula b = r (sy/sx) = r (10/4) = r (2.5). Since r is between -1 and 1, b must be between -2.5 and 2.5.
500

A summer resort rents rowboats to customers but does not allow more than four people to a boat. Each boat is designed to hold no more than 800 pounds.

Suppose the distribution of adult males who rent boats, including their clothes and gear, is normal with a mean of 190 pounds and standard deviation of 10 pounds. If the weights of individual passengers are independent, what is the probability that a group of four adult male passengers will exceed the acceptable weight limit of 800 pounds?

First, I need to find my combined mean and standard deviation for four adults. 

New mean: 190 * 4 = 760

New st. dev: sq. root of (10^2 *4) = 20

P(X > 800) = normalcdf(lower: 800, upper: 99999, mean: 760, st. dev: 20) = 0.0228

500

At JFK Terminal 5, all bags entering the terminal must be screened. Ninety-seven percent of the bags that contain forbidden material trigger an alarm. Fifteen percent of the bags that do not contain forbidden material also trigger the alarm. If 1 out of every 1,000 bags entering the building contains forbidden material, what is the probability that a bag that triggers the alarm will actually contain forbidden material?

P(Illegal | Trigger) = P (Illegal & Trigger) / P (Trigger)

P(Illegal | Trigger) = [0.001* 0.97] / [(0.97*0.001) + (0.15*0.999)] = 0.0064

Let's make a chart:

P(Legal) = 0.999

          P(Trigger | Legal) = 0.15

          P (No Trigger | Legal) = 0.85

P(Illegal) = 0.001

          P (Trigger | Illegal) = 0.97

          P (No Trigger | Illegal) = 0.03

M
e
n
u