A statistic describes a sample, whereas a ______ describes a population.
Parameter
In a quality control test, a factory has a 90% success rate in producing defect-free products. A batch contains 12 products. Let X represent the number of defect-free products in a batch. What is the expected number of defect-free products in the batch?
10.8
Binomial E(x) = n*p = 12 * 0.9 = 10.8
The average weight of apples from a farm is normally distributed with a mean of 200 grams and a standard deviation of 15 grams. Using the empirical rule, estimate the percentage of apples that weight between 215 and 245 grams.
15.85%
16% to the right of 215
0.15% to the right of 245
16% - 0.15% = 15.85%
A sample of 50 people has a mean of 35 with a standard deviation of 10. If you were to construct a 95% confidence interval, the margin of error would be _____.
2.77
= 1.96 * 10 / sqrt(50) = 2.77
You are interested in predicting the annual sales of a grocery store based on the store's square footage. The LSQ equation that results from your regression is:
y_hat = 235,00 + 124.40x
If the actual sales for a grocery store of 10,000 square feet is $1,350,000, what is the residual and is your model over or underestimating?
$102,650 and Underestimating
residual = 1,350,000 - 1,247,350 = 102,650
Since the residual is positive, the actual value is higher than the predicted.
A large high school has 1,200 students evenly distributed across four grade levels. A researcher decides to randomly select 50 students from each grade level and survey them.
1. What type of sampling method is being used?
2. If the researcher instead randomly selected 8 entire homeroom classes and surveys every student in those homerooms, what type of sampling method is being used?
1. Stratified (strata = grades, randomly select from each grade)
2. Cluster (clusters = homerooms, randomly select entire homerooms)
In a game, the probability of getting a perfect score on a level is p = 0.2. Let X be the number of attempts needed to get the first perfect score.
Calculate the expected number of attempts until the first perfect score.
Calculate Var(X).
E(x) = 1 / 0.2 = 5 attempts
Var(X) = (1 – 0.2) / 0.22 = 20
The population of annual incomes of a certain city is normally distributed with a mean of $45,000 and a standard deviation of $12,000.
What is the probability that a randomly selected resident of the city makes more than $58,000?
1 – norm.dist(58000, 45000, 12000, TRUE)
1) If you increase the sample size when constructing a confidence interval, the width of the confidence interval ______.
2) If you increase the confidence level when creating a confidence interval, the width of the confidence interval ______.
1) Decreases
higher n --> lower MoE --> narrower CI
2) Increases
higher confidence --> higher CV --> higher MoE --> wider CI
True or False: The interpretation of 𝛽0 is that “for a 1 unit increase in x, y_hat increases by 𝛽0.”
False --𝛽0 is the intercept (the value of y_hat when x = 0). Even if the problem did say 𝛽1, the correct interpretation would be “for whatever reason, a 1 unit increase in X is associated with an increase/decrease of 𝛽1 on average.
Consider these events to be independent:
1. You roll a six-sided die
2. You flip a coin
What is the probability of rolling a 6 followed by a heads?
1/6 * 1/2 = 1/12
Imagine you are the line manager at a very large factory. Assume each product is either defective or not defective. The non-defective rate for each product is 88%. The probability the first defective productive is found between the 5th and 9th products (inclusive) is _____.
0.2832
p = 1 - 0.88 = 0.12
P(X<=9) = 1 - (1 - 0.12)|9| = 0.6835
P(X<=4) = 1 - (1 - 0.12)|4| = 0.4003
P(5<=X<=9) = 0.6835 - 0.4003 = 0.2832
Let X ~ N(15, sigma) and P(X < 18.75) = 0.9985
Sigma = ?
1.25
z = 3 (since X is 3 standard deviations above the mean (shown by the 0.9985 percentile))
3 = (18.75 – 15) / sigma
Sigma = 1.25
A company produces lightbulb. A random sample of 100 of the company's bulb has a mean lifespan of 1500 hours with a standard deviation of 200 hours. A 98% confidence interval for the true population mean lifespan of the bulbs is _____.
t.inv(0.025, 100) = 2.23
t.inv(0.02, 99) = 1.98
t.inv(0.01, 99) = -2.36
(1452.71, 1547.29)
1500 +/- 2.36 * (200 / sqrt(100)) = (1452.71, 1547.29).
The coefficient of determination for your linear regression model is 0.74. If your regression line is positive (i.e., upwards-sloping), what is your model's correlation coefficient?
0.86
sqrt(0.74) --> works only because r is positive (positive relationship indicated by the upwards-sloping regression line)
The probability that Matt hits a dartboard is 0.21. The probability Mikey hits the dartboard is 0.68. The probability that either Matt or Mikey hits the dartboard is 0.80. What is the probability that Matt hits the dartboard given that Mikey hits the dartboard?
0.1324
0.80 = 0.21 + 0.68 - P(M intersect K)
P(M intersect K) = 0.09
P(M | K) = 0.09 / 0.68 = 0.1324
In a quality control test, a factory has a 90% success rate in producing defect-free products. A batch contains 12 products. Let X represent the number of defect-free products in a batch.
a) What is P(X = 9)?
a) (12 choose 4) * (0.9)9 * (0.1)3 = 0.085
Salaries are normally distributed with a mean of $60,000 and standard deviation of $8,000
What salary marks the top 10%? [Choose to report either the Excel formula or the exact Empirical Rule probability]
*top 10% is the same as the 90th percentile
= norm.inv(0.90, 60000, 8000)
In a study, 120 out of 200 men and 90 out of 180 women support a new policy. For simplicity, let p_hat1 be the proportion of men who support the policy and let p_hat2 be the proportion of women who support the policy. A 95% confidence interval for the difference in proportions of support for the policy is _____.
(0.0003, 0.1997)
p_hat1 = 0.60 | n1 = 200 | p_hat2 = 0.50 | n2 = 180 | p_hat1 - p_hat2 = 0.10
standard error = sqrt(0.6 * 0.4 / 200 + 0.5 * 0.5 / 180) = 0.0509
CI: 0.10 +/- 1.96 * 0.0509 = (0.0003, 0.1997)
The slope of a regression line is 4, the standard deviation of X is sx= 2, and the standard deviation of sy= 10. Find the correlation coefficient, r.
4 * 2 / 10 = 0.8
The probability that George gets an A in a class is 0.5, whereas the probability Brayton gets an A in that class is 0.6. The probability they both get an A is 0.3. What is the probability that neither of them get an A?
0.2
= 1 - 0.2 - 0.3 - 0.3
Test whether the two variables are independent at alpha = 0.05.
Good | Poor
Exercise 45 15
No Exercise 25 25
Χ²0.05,1 = 3.84 Χ²0.025,2 = 2.71 Χ²0.05,4 = 6.63
Expected Counts: EG = 38.18, EP = 21.82, NoExG = 31.82, NoExP = 18.18
Test Stat = 7.37
Critical value = 3.84
Since the test stat (7.37) is more extreme than the critical value (3.84), we reject Ho. There is evidence that the two variable (exercise and sleep quality) are associated.
A nutritionist wants to determine if there is a difference in the average daily calorie intake between two groups. A sample of 15 vegans has a mean intake of 1800 kcal with a standard deviation of 200 kcal. A sample of 20 non-vegans has a mean intake of 2100 kcal with a standard deviation of 250 kcal.
At alpha = 0.05, test whether there is a significant difference in the mean daily calorie intake between these two groups.
t.inv(0.025; 14) = 2.145
t.inv(0.05; 14) = 1.23
t.inv(0.025; 15) = 0.88
Reject the null hypothesis
x_bar1 = 1800 | n1 = 15 | x_bar2 = 2100 | n2 = 20 | CV = 2.145
standard error = sqrt( 2002 / 15 + 2502 / 20) = 76.103
test statistic = (1800 - 2100) / 76.103 = -3.94
Since | -3.95 | > 2.145, reject the null.