A researcher is interested in the sample proportion of students who have taken a specific class in Tippie. She conducts a SRS(7, P) and finds p = 0.13.
She thought p_hat would be characterized by
N(p, sqrt(p(1 - p)/n)) due to the CLT; however, the CLT doesn't seem to hold. Why?
0.0019
P_hat ~ N(0.4, sqrt(0.4*(0.6)/200))
P(N(0.4, 0.0346) < 0.3) = 0.0019
What does the p-value represent?
The probability of obtaining a test statistic as extreme as the observed value, assuming the null hypothesis is ture.
Describe two ways to decrease the width of a confidence interval.
1. Increasing the sample size
2. Decreasing the confidence level
A study finds that students who eat breakfast before an exam tend to score higher on that exam. Based on this result, the school administration decides to implement a policy requiring all students to eat breakfast in the school cafeteria before their exams.
What is the fallacy?
Association implying causation - just because there is a correlation b/w eating breakfast and higher exam scores, that does not prove that eating breakfast CAUSES higher scores. Other factors such as sleep quality, study habits, or overall health could be responsible.
It is claimed that 40% of a company's computers are old, where "old" means the computer is over 2 years old. Under this claim, answer the following:
For a SRS of 200 computers, what are the approximate (use CLT) chances that less than 30% of the sample computers are old?
N isn't large enough!
Requirements: n*p >= 5 AND n(1 - p) >= 5
7 * 0.13 < 5; therefore, CLT does not hold.
A sample of 30 students from a university has a mean of 15 hours per week and a standard deviation of 4.3 hours. Test whether the average weekly study time at the university is significantly different from 16 hours.
(15 – 16) / (4.3 / sqrt(30)) = -1 / 0.785 = -1.27
t(30 – 1 degrees of freedom)
P(t < - 1.27) = 0.10709 * 2 = 0.21418
Since 0.21 > 0.05, we fail to reject the null hypothesis.
A researcher is studying the average amount of time university students spend studying each week. A random sample of 40 students is selected, and the average study time is found to be 15 hours with a sample standard deviation of 4.5 hours.
Calculate a 95% confidence interval for the average amount of time university students spend studying each week.
Step 1: find df → 40 – 1 = 39
Step 2: find SE → 4.5 / sqrt(40) = 0.713
Step 3: find pivot percentile → t(39 df, 0.025 quantile) = 2.022
Step 4: find ME: 2.022 * 0.713 = 1.44
Step 5: find CI: 15 +/- 1.44 = [13.56, 16.44]
A pharmaceutical company runs multiple tests to find out which of three drugs (A, B, or C) is most effective for treating hypertension. After conducting 20 tests on each drug (testing combinations of dosages, timing, and patient groups), the company reports Drug A is significantly more effective than the other two.
What is the fallacy?
Multiple testing bias - by running so many tests on different combinations of dosages, timings, and groups, the company increases the chances of finding a result by random change. The claim that Drug A is significantly better might be a result of the multiple testing and not a true effect.
Suppose the population mean and standard deviation of the X values are mean(X:P) = 150 and sd(X:P) = 25, respectively. A simple random sample of 100 from P will be taken and the sample average X_bar value will be recorded. Which of the following gives the approximate chances the X_bar value will be greater than 154?
A. <10-6
B. 0.4367
C. 0.0548
D. 0.5636
E. Cannot be approximated
C
Given population mean/std dev --> StaMa
(154 - 150) / (25 / sqrt(100)) = 1.6
P(N(0,1) > 1.6) = 0.0548
A factory produces light bulbs, and a sample of 50 light bulbs is selected. Of the sample, 12 of the lightbulbs were defective. A researcher claims that the company’s lightbulb defect rate is greater than 20%. Test the researcher’s claim.
Sample proportion = 12 / 50 = 0.24 Ho: p = 0.20 Ha: p > 0.20
Test stat = (0.24 – 0.20) / (sqrt(0.20*(1 – 0.20) / sqrt(50)) = 0.707
P(Z > 0.707) = 0.2398
Since 0.2398 > 0.05, we fail to reject the null hypothesis. There is not enough evidence to suggest the lightbulb defect rate is greater than 0.20.
A company wants to evaluate both the satisfaction rate and the average wait time at their support center. In a survey of 150 customers, 105 report being satisfied. A separate sample of 25 customers shows an average wait time of 4.8 minutes with a standard deviation of 1.2 minutes.
Construct a 95% confidence interval for the true proportion of customers satisfied.
p_hat = 0.70 SE = sqrt(0.70 * 0.30 / 150) = 0.0374
N(0.025 quantile) = 1.96
CI = 0.70 +/- 1.96 * 0.0374 = [0.627, 0.773]
At two hospitals, the success rates for a certain medical treatment are as follows:
Hospital 1: 80% success rate for men, 90% success rate for women.
Hopsital 2: 85% success rate for men, 90% success rate for women.
However, when the data from both hospitals are combined, the overall success rate was 75% for men and 70% for women. How could this be?
Simpson's paradox - when the data from both hospitals are combined, it appears that men have a better treatment success rate than women. However, this reversal of the observed trend occurs because the hospitals differ in the # of male and female patients and, when analyzed separately, the data show women actually have a slightly better success rate.
If you increase your sample size from n = 25 to n = 100, what happens to the standard deviation of the sampling distribution of the mean?
Shrinks by half
std dev = std dev / sqrt(n)
sqrt(100) = 2 * sqrt(25)
You are dividing by a number twice as large, so the standard deviation shrinks by half.
Researchers compare reaction times (in ms) between two interface designs:
Group A: n = 22 x_bar = 248 sample std dev = 32
Group B: n = 18 x_bar = 267 sample std dev = 45
Test whether the mean reaction time for Group A is less than that for Group B.
Ho: mu1 = mu2 Ha: mu1 < mu2
Test stat = (248 – 267) / sqrt(322 / 22 + 452 / 18) = -1.45
T(min(n1, n2) – 1 degrees of freedom) = 17 df
P(t < -1.45) = 0.083 … fail to reject Ho
A researcher compares test scores from two groups:
Group A: n = 30, x_bar = 78, sx_bar = 10
Group B: n = 35, x_bar = 74, sx_bar = 12
A 95% confidence interval for the difference in population means is _____.
t.inv(0.025, 35) = 1.88
t.inv(0.025, 34) = 1.97
t.inv(0.025, 29) = 2.045
t.inv(0.025, 30) = 2.123
[-1.58, 9.58]
pivot percentile = 2.045
standard error = sqrt(102 / 30 + 122 / 35) = 2.729
CI: (78 - 74) +/- 2.045 * 2.729 = [-1.58, 9.58]
A university researcher tests 10 different interventions for improving student performance, such as changes to class size, study materials, sleep habits, and exercise routines. After many statistical tests, one of the interventions (increasing class time) shows a significant improvement in performance. The researcher then attributes this improvement solely to the intervention, recommending it as a permanent solution.
What is the fallacy?
Multiple testing bias - by testing so many different interventions so many times, the researcher increases the probability of finding a statistically significant result by chance, even if there isn't any.
A factory produces lights bulbs, and historically 3% of them are defective. A quality control inspector randomly selects 40 bulbs for inspection.
Let X be the number of defective bulbs in the sample.
What is the probability that no more than 2 bulbs are defective?
X ~ Binomial(n = 40, p = 0.03)
P(X <= 2) = 0.882
(n choose x) * px * (1 - p)n - x