Stats: Final Exam

Intro & Probability

Discrete Distributions

Continuous Distributions

CIs & HTs

Linear Regression

100

A statistic describes a sample, whereas a ______ describes a population.

Parameter

100

In a quality control test, a factory has a 90% success rate in producing defect-free products. A batch contains 12 products. Let X represent the number of defect-free products in a batch. What is the expected number of defect-free products in the batch?

10.8

Binomial E(x) = n*p = 12 * 0.9 = 10.8

100

The average weight of apples from a farm is normally distributed with a mean of 200 grams and a standard deviation of 15 grams. Using the empirical rule, estimate the percentage of apples that weight between 215 and 245 grams.

15.85%

16% to the right of 215

0.15% to the right of 245

16% - 0.15% = 15.85%

100

A sample of 50 people has a mean of 35 with a standard deviation of 10. If you were to construct a 95% confidence interval, the margin of error would be _____.

2.77

= 1.96 * 10 / sqrt(50) = 2.77

100

You are interested in predicting the annual sales of a grocery store based on the store's square footage. The LSQ equation that results from your regression is:

y_hat = 235,00 + 124.40x

If the actual sales for a grocery store of 10,000 square feet is $1,350,000, what is the residual and is your model over or underestimating?

$102,650 and Underestimating

residual = 1,350,000 - 1,247,350 = 102,650

Since the residual is positive, the actual value is higher than the predicted.

200

A large high school has 1,200 students evenly distributed across four grade levels. A researcher decides to randomly select 50 students from each grade level and survey them.

1. What type of sampling method is being used?

2. If the researcher instead randomly selected 8 entire homeroom classes and surveys every student in those homerooms, what type of sampling method is being used?

1. Stratified (strata = grades, randomly select from each grade)

2. Cluster (clusters = homerooms, randomly select entire homerooms)

200

In a game, the probability of getting a perfect score on a level is p = 0.2. Let X be the number of attempts needed to get the first perfect score.

Calculate the expected number of attempts until the first perfect score.

Calculate Var(X).

E(x) = 1 / 0.2 = 5 attempts

Var(X) = (1 – 0.2) / 0.22 = 20

200

The population of annual incomes of a certain city is normally distributed with a mean of $45,000 and a standard deviation of $12,000.

What is the probability that a randomly selected resident of the city makes more than $58,000?

1 – norm.dist(58000, 45000, 12000, TRUE)

200

1) If you increase the sample size when constructing a confidence interval, the width of the confidence interval ______.

2) If you increase the confidence level when creating a confidence interval, the width of the confidence interval ______.

1) Decreases

higher n --> lower MoE --> narrower CI

2) Increases

higher confidence --> higher CV --> higher MoE --> wider CI

200

True or False: The interpretation of 𝛽₀ is that “for a 1 unit increase in x, y_hat increases by 𝛽₀.”

False --𝛽0 is the intercept (the value of y_hat when x = 0). Even if the problem did say 𝛽₁, the correct interpretation would be “for whatever reason, a 1 unit increase in X is associated with an increase/decrease of 𝛽₁on average.

300

Consider these events to be independent:

1. You roll a six-sided die

2. You flip a coin

What is the probability of rolling a 6 followed by a heads?

1/6 * 1/2 = 1/12

300

Imagine you are the line manager at a very large factory. Assume each product is either defective or not defective. The non-defective rate for each product is 88%. The probability the first defective productive is found between the 5th and 9th products (inclusive) is _____.

0.2832

p = 1 - 0.88 = 0.12

P(X<=9) = 1 - (1 - 0.12)|9| = 0.6835

P(X<=4) = 1 - (1 - 0.12)|4| = 0.4003

P(5<=X<=9) = 0.6835 - 0.4003 = 0.2832

300

Let X ~ N(15, sigma) and P(X < 18.75) = 0.9985

Sigma = ?

1.25

z = 3 (since X is 3 standard deviations above the mean (shown by the 0.9985 percentile))

3 = (18.75 – 15) / sigma

Sigma = 1.25

300

A company produces lightbulb. A random sample of 100 of the company's bulb has a mean lifespan of 1500 hours with a standard deviation of 200 hours. A 98% confidence interval for the true population mean lifespan of the bulbs is _____.

t.inv(0.025, 100) = 2.23

t.inv(0.02, 99) = 1.98

t.inv(0.01, 99) = -2.36

(1452.71, 1547.29)

1500 +/- 2.36 * (200 / sqrt(100)) = (1452.71, 1547.29).

300

The coefficient of determination for your linear regression model is 0.74. If your regression line is positive (i.e., upwards-sloping), what is your model's correlation coefficient?

0.86

sqrt(0.74) --> works only because r is positive (positive relationship indicated by the upwards-sloping regression line)

400

The probability that Matt hits a dartboard is 0.21. The probability Mikey hits the dartboard is 0.68. The probability that either Matt or Mikey hits the dartboard is 0.80. What is the probability that Matt hits the dartboard given that Mikey hits the dartboard?

0.1324

0.80 = 0.21 + 0.68 - P(M intersect K)

P(M intersect K) = 0.09

P(M | K) = 0.09 / 0.68 = 0.1324

400

In a quality control test, a factory has a 90% success rate in producing defect-free products. A batch contains 12 products. Let X represent the number of defect-free products in a batch.

a) What is P(X = 9)?

a) (12 choose 4) * (0.9)9 * (0.1)3 = 0.085

400

Salaries are normally distributed with a mean of $60,000 and standard deviation of $8,000

What salary marks the top 10%? [Choose to report either the Excel formula or the exact Empirical Rule probability]

*top 10% is the same as the 90th percentile

= norm.inv(0.90, 60000, 8000)

400

In a study, 120 out of 200 men and 90 out of 180 women support a new policy. For simplicity, let p_hat1 be the proportion of men who support the policy and let p_hat2 be the proportion of women who support the policy. A 95% confidence interval for the difference in proportions of support for the policy is _____.

(0.0003, 0.1997)

p_hat1 = 0.60 | n1 = 200 | p_hat2 = 0.50 | n2 = 180 | p_hat1 - p_hat2 = 0.10

standard error = sqrt(0.6 * 0.4 / 200 + 0.5 * 0.5 / 180) = 0.0509

CI: 0.10 +/- 1.96 * 0.0509 = (0.0003, 0.1997)

400

The slope of a regression line is 4, the standard deviation of X is sx= 2, and the standard deviation of sy= 10. Find the correlation coefficient, r.

4 * 2 / 10 = 0.8

500

The probability that George gets an A in a class is 0.5, whereas the probability Brayton gets an A in that class is 0.6. The probability they both get an A is 0.3. What is the probability that neither of them get an A?

0.2

= 1 - 0.2 - 0.3 - 0.3

500

Test whether the two variables are independent at alpha = 0.05.

Good | Poor

Exercise 45 15

No Exercise 25 25

Χ²_0.05,1 = 3.84 Χ²_0.025,2 = 2.71 Χ²_0.05,4 = 6.63

Expected Counts: EG = 38.18, EP = 21.82, NoExG = 31.82, NoExP = 18.18

Test Stat = 7.37

Critical value = 3.84

Since the test stat (7.37) is more extreme than the critical value (3.84), we reject Ho. There is evidence that the two variable (exercise and sleep quality) are associated.

500

A nutritionist wants to determine if there is a difference in the average daily calorie intake between two groups. A sample of 15 vegans has a mean intake of 1800 kcal with a standard deviation of 200 kcal. A sample of 20 non-vegans has a mean intake of 2100 kcal with a standard deviation of 250 kcal.

At alpha = 0.05, test whether there is a significant difference in the mean daily calorie intake between these two groups.

t.inv(0.025; 14) = 2.145

t.inv(0.05; 14) = 1.23

t.inv(0.025; 15) = 0.88

Reject the null hypothesis

x_bar1 = 1800 | n1 = 15 | x_bar2 = 2100 | n2 = 20 | CV = 2.145

standard error = sqrt( 2002 / 15 + 2502 / 20) = 76.103

test statistic = (1800 - 2100) / 76.103 = -3.94

Since | -3.95 | > 2.145, reject the null.