A model that shows behavior of a given statistic over all possible samples with the same n (size)
What are sampling distribution models?
In a large class of introductory Statistics students, the professor has each person toss a fair coin 12 times and calculate the proportion of his or her tosses that were heads. The students then report their results, and the professor plots a histogram of these several proportions.
a) What shape would you expect this histogram to be?
Why?
Where do you expect the histogram to be centered?
How much variability would you expect among these proportions?
Explain why a Normal model should not be used here.
a) symmetric
b) 0.5
c) 0.114
d) the success
Modeled normally by mean = p and standard deviation = sq root. pq/n if the independence, randomization, and success/failure conditions are met
What is the Sampling Distribution Model for Proportions?
Assume that 30% of students at a university wear contact lenses. We randomly pick 100 students. Let p represent the proportion of students in this sample who wear contacts.
What's the appropriate model for the distribution of p? Specify the name of the distribution, the mean, and the standard deviation. Be sure to verify that the conditions are met.
What's the approximate probability that more than one third of this sample wear contacts?
conditions
sample size is large enough and the number of successes (students wearing contacts) and failures (students not wearing contacts) should both be greater than 5.
We calculate the expected number of successes and failures:
- Expected successes (contacts): n * p = 100 * 0.30 = 30
- Expected failures (not contacts): n * (1 - p) = 100 * 0.70 = 70
Both expected values are greater than 5, so the conditions are satisfied.
Now, we can model the distribution of p using a normal distribution. The mean (μ) and standard deviation (σ) of the sampling distribution of p can be calculated as follows:
Mean (μ) = p = 0.30
Standard deviation (σ) = sqrt(p * (1 - p) / n) = sqrt(0.30 * 0.70 / 100) = sqrt(0.21 / 100) = sqrt(0.0021) ≈ 0.04583
So the appropriate model for the distribution of p is a normal distribution with:
- Name of the distribution: Normal distribution
- Mean: 0.30
- Standard deviation: approximately 0.04583
b) Now, we need to find the approximate probability that more than one third (1/3) of the sample wear contacts. One third of 100 students is approximately 33 students. In terms of proportion, this is 33/100 = 0.33.
To find this probability, we can standardize this value using the z-score formula:
z = (x - μ) / σ
Where:
- x = 0.33
- μ = 0.30
- σ ≈ 0.04583
Calculating the z-score:
z = (0.33 - 0.30) / 0.04583 ≈ 0.30 / 0.04583 ≈ 6.54
Now, we can look up the z-score in the standard normal distribution table or use a calculator to find the probability. A z-score of 6.54 is extremely high, indicating that the probability of observing more than 0.33 is very close to 0.
Therefore, the approximate probability that more than one third of this sample wear contacts is essentially 0.
a) The distribution is a normal distribution with mean 0.30 and standard deviation approximately 0.04583.
b) The approximate probability that more than one third of this sample wear contacts is close to 0.
A representative subset of a given number of a observations from a population
An entire group of instances that is studied
1. What is a sample?
2. What is a population?
The expected variability between two given random samples.
What is Sampling Variability (or Error)
The Centers for Disease Control and Prevention report that 22% of 18-year-old women in the United States have a body mass index (BMI)' of 25 or more a value considered by the National Heart Lung and Blood Institute to be associated with increased health risk. As part of a routine health check at a large college, the physical education department usually requires students to come in to be measured and weighed. This year, the department decided to try out a self-report system. It asked 200 randomly selected female students to report their heights and weights (from which their BMls could be calculated). Only 31 of these students had BMis greater than 25.
Question: Is this proportion of high-BMI students unusually small?
Check Conditions:
Randomization Condition: The department drew a random sample, so the respondents should be independent and randomly selected from the population.
10% Condition: 200 respondents is less than 10% of all the female students at a "large college."
Success/Failure Condition: The department expected пр = 200(0.22) = 44 "successes" and nq = 200(0.78) = 156 "failures," both at least 10.
It's okay to use a Normal model to describe the sampling distribution of the proportion of respondents with BMis above 25.
The Phys ed department observed p̂=31/200=0.155.
The department expected E(p̂)=p= 0.22, with SD(p̂)= sq. root pq/n= sq root. (0.22)(0.78)/200 = 0.029.
so z= p̂-p/SD(p̂)= 0.155-0.22/0.029=-2.24
By the 68-95-99.7 Rule, I know that values more than 2 standard deviations below the mean of a Normal model show up less than 2.5% of the time. Perhaps women at this college differ from the general population, or self-reporting may not provide accurate heights and weights.
Modeled normally by mean = μ and standard deviation= /sqr. n if the independence, randomization, and sufficiently large sample size conditions are met.
What is the Sampling Distribution Model for Means?
State police believe that 70% of the drivers traveling on a major interstate highway exceed the speed limit. They plan to set up a radar trap and check the speeds of 80 cars.
Do you think the appropriate conditions necessary for your analysis are met? Explain.
apply the CLT to find the sampling distrubtion of the sample proportion.
1. population proportion; 70% of drivers exceeded so p=0.70
2. sample size (n)- the police will check 80 cars on their speed, so therefore n= 80
check conditions from applying CLT
sample size must be large enough. both np and n(1-p) should be greater than or equal to ten.
calculating np: 80 times 0.70 is 56
calculating n(1-p):
find the standard deviation of the sampling distribution of the sample proportion (σp):
σp = sqrt((p * (1 - p)) / n)
= sqrt((0.70 * 0.30) / 80)
= sqrt(0.21 / 80)
= sqrt(0.002625)
= 0.0512 (approximately)
Using the 68-95-99.7 Rule, we can describe the distribution:
- Mean (μp) = p = 0.70
- Standard deviation (σp) = 0.0512
b) The appropriate conditions for the analysis are met because both np and n(1-p) are greater than 10, indicating that the sample size is large enough for the Central Limit Theorem to apply. This means we can assume that the sampling distribution of the sample proportion is approximately normal.
Unimodal, roughly symmetric bell curves that show the distribution of a given quantitative variable.
What is a Normal Distribution?
A condition for independence to continue to be assumed when sampling without replacement.
What is 10% condition?
Herpetologists (snake specialist) found that a certain species of reticulated python have an average length of 20.5 feet with a standard deviation of 2.3 feet. The scientists collect a random sample of 30 adult pythons and measure their lengths. In their sample the mean length was 19.5 feet long. One of the herpetologists fears that pollution might be affecting the natural growth of the pythons. Do you think this sample result is unusually small? Explain.
We have a random sample of adult pythons drawn from a much larger population. With a sample size of 30, the CLT says that the approximate sampling model for sample means will be N(20.5, 0.42). A sample mean of only 19.5 feet is about 2.38 standard deviations below what we expect. The sample mean of 19. feet is unusually small.
An assumption such that sampled values are independent of each other in order to use a normal approximation for sampling models.
What is the Independence Assumption?
Public health statistics indicate that 26.4% of American adults smoke cigarettes. Using the 68-95-99.7 Rule, describe the sampling distribution model for the proportion of smokers among a randomly selected group of 50 adults. Be sure to discuss your assumptions and conditions.
To describe the sampling distribution model for the proportion of smokers among a randomly selected group of 50 adults, we can use the CLT, which states that the sampling distribution of the sample proportion will be approximately normally distributed if certain conditions are met.
First, let's define the population proportion of smokers (p) as 0.264 (or 26.4%). The sample size (n) is 50. The conditions for using the normal approximation are:
1. Random Sampling: The sample should be randomly selected from the population. This ensures that each individual has an equal chance of being included in the sample.
2. Independence: The sampled individuals should be independent of each other. Given the sample size of 50, this condition is met if the population of adults is much larger than 50.
3. Sample Size: We need to check if both np and n(1-p) are greater than 5:
- np = 50 * 0.264 = 13.2
- n(1-p) = 50 * (1 - 0.264) = 50 * 0.736 = 36.8
Since both values are greater than 5, we can proceed with the normal approximation.
Now, we can calculate the mean (μ) and standard deviation (σ) of the sampling distribution of the sample proportion (p̂):
- The mean of the sampling distribution (μ) is equal to the population proportion:
μ = p = 0.264
- The standard deviation (σ) of the sampling distribution is σ = sqrt[(p(1-p)/n)] = sqrt[(0.264 * 0.736) / 50] = sqrt[0.0194] ≈ 0.139
According to the 68-95-99.7 Rule (Empirical Rule), we can make the following predictions about the sampling distribution:
- Approximately 68% of sample proportions will fall within one standard deviation of the mean (μ ± σ):
- From 0.264 - 0.139 to 0.264 + 0.139, which is approximately (0.125, 0.403).
- Approximately 95% of sample proportions will fall within two standard deviations of the mean (μ ± 2σ):
- From 0.264 - 2(0.139) to 0.264 + 2(0.139), which is approximately (0.026, 0.502).
- Approximately 99.7% of sample proportions will fall within three standard deviations of the mean (μ ± 3σ):
- From 0.264 - 3(0.139) to 0.264 + 3(0.139), which is approximately (-0.113, 0.641). Since proportions can't be negative, we can consider the lower bound to be 0.
The sampling distribution of the proportion of smokers among a randomly selected group of 50 adults would be approximately normally distributed with a mean of 0.264 and a standard deviation of about 0.139, assuming the conditions for using the normal approximation are met.
A standardization of data that allows for the calculation of probability with a standard normal model.
What is a Z-score?
A condition for a normal model to be a good approximation of binomial model; expectation of 10 successes and 10 failures
What is the Sucess/Failure Condition?
The average composite ACT score for Ohio students who took the test in 2003 was 21.4. Assume that the standard deviation is 1.05. In a random sample of 25 students who took the exam in 2003, what is the probability that the average composite ACT score is 22 or more? (Make sure to identify the sampling distribution you use and check all necessary conditions.)
Conditions: Random sampling condition: We have been told that this is a random sample. Independence assumption: It's reasonable to think that the scores of the 25 students are mutually independent. 10% condition: 25 students are certainly less than 10% of all students who took the exam.
We're assuming that the model for composite ACT scores has a mean of µ = 21.4 and standard deviation of a-=1.05. The average composite ACT score for a sample of 25 randomly selected students is 22 or more is 0.0021.P(x > 22) = P(Z > 2.86) = 0.0021, so the probability that the average composite ACT score for a sample of 25 randomly selected students is 22 or more is 0.0021.
The assumption such that the sample size, n, must be large enough in order to use a normal approximation for sampling models.
What is the Sample Size Assumption?
Based on past experience, a bank believes that
7% of the people who receive loans will not make payments on time. The bank has recently approved 200 loans.
What are the mean and standard deviation of the proportion of clients in this group who may not make timely payments?
What assumptions underlie your model? Are the conditions met? Explain. What's the probability that over 10% of these clients will not make timely payments?
a) First, let's calculate the mean and standard deviation of the proportion of clients who may not make timely payments.
1. Mean (μ): The mean proportion of clients who do not make timely payments is equal to the probability of a client not making timely payments, which is 7% or 0.07.
2. Standard Deviation (σ): The standard deviation of the proportion can be calculated using the formula:
σ = sqrt[p(1 - p) / n]
Where:
- p = proportion of clients not making timely payments = 0.07
- n = number of loans = 200
Plugging in these values:
σ = sqrt[0.07 * (1 - 0.07) / 200]
= sqrt[0.07 * 0.93 / 200]
= sqrt[0.0651 / 200]
= sqrt[0.0003255]
≈ 0.0180
So, the mean proportion is 0.07, and the standard deviation is approximately 0.0180.
b) Assumptions underlying the model:
- The loans are independent of each other.
- The probability of not making timely payments is constant for each loan (7%).
- The sample size (200) is large enough for the normal approximation to be valid.
Are the conditions met?:
- Given that the sample size is 200 and the expected number of clients not making timely payments is n * p = 200 * 0.07 = 14 (which is greater than 10), and the expected number making timely payments is n * (1 - p) = 200 * 0.93 = 186 (also greater than 10), the conditions for using the normal approximation are satisfied.
Now, let's find the probability that over 10% of these clients will not make timely payments.
To find this probability, we need to calculate the z-score for 10%.
10% of 200 loans = 0.10 * 200 = 20 loans.
Calculating Z-score where:
- X = number of clients not making timely payments = 20
- μ = mean number of clients not making timely payments = n * p = 200 * 0.07 = 14
- σ = standard deviation = n * σ_p = 200 * 0.0180 ≈ 3.6
Plugging in the values:
z = (20 - 14) / 3.6
= 6 / 3.6
≈ 1.67
The z-score of 1.67 in the standard normal distribution table to find the probability. A z-score of 1.67 corresponds to a probability of approximately 0.9525. This means that the probability of 20 or fewer clients making timely payments is about 95.25%.
To find the probability of more than 10% (20 clients), we subtract this from 1:
P(X > 20) = 1 - P(X ≤ 20)
= 1 - 0.9525
≈ 0.0475.
Therefore, the probability that over 10% of these clients will not make timely payments is approximately 0.0475 or 4.75%.
The rule which states that 68% of values are within 1 standard deviation of the mean, 95% are within 2 standard deviations and 99.7% are within 3 standard deviations.
What is the empirical Rule?
The sampling distribution model of the sample mean (and proportion) from a random sample is approximately Normal for large n, regardless of population distribution if the observations are independent
What is the Central Limit Theorem?
According to Gallup, about 33% of Americans polled said they frequently experience stress in their daily lives. Suppose you are in a class of 45 students.
a. What is the probability that no more than 12 students in the class will say that they frequently experience stress in their daily lives? (Make sure to identify the sampling distribution you use and check all necessary conditions.)
b. If 20 students in the class said they frequently experience stress in their daily lives, would you be surprised? Explain, and use statistics to support your answer.
We want to find the probability that no more than 12 students in the class will say that they frequently experience stress. This is the same as asking the probability of finding less than 26.7% of "stressed" students in a class of 45 students.
Check the conditions:
1. 10% condition: 45 students is less than 10% of all students who could take the class
2. Success/failure cond.: np =45(0.33) = 14.85 ,nq =45(0.67) = 30.15, which both exceed 10
We need to standardize the 26.7% and the find the probability of getting a z-score less than or equal to the one we find: z= 0.267-0.33/0.070, z=-0.090
P(p < 0.267) = P(z <-0.90) = 0.1841, so the probability is about 18.4% that no more than 12 students will say that they frequently experience stress in their daily lives.
b. From part a, we can use N( 0.33, 0.070) to model the sampling distribution. Twenty students is about 44.4% of the class. This is about 1.63 standard deviations above what we would expect, which is not a surprising result.
Tthe condition of randomized subjects/sampling in order to use a normal approximation for sampling models.
What is the randomization condition?
Information on a packet of seeds claim that the germination rate is 92%. What's the probability that more than 95% of the 160 seeds in the packet will germinate? Be sure to discuss your assumptions and check the conditions that support your model.
To determine the probability that more than 95% of the 160 seeds in the packet will germinate, we can use the normal approximation to the binomial distribution. Here are the steps to solve the problem:
1. Define the parameters:
- The population proportion of seeds that germinate (p) is 0.92 (or 92%).
- The sample size (n) is 160.
2. Check the conditions for using the normal approximation:
- We need to ensure that both np and n(1-p) are greater than 5:
- np = 160 * 0.92 = 147.2
- n(1-p) = 160 * (1 - 0.92) = 160 * 0.08 = 12.8
Since both values are greater than 5, we can use the normal approximation.
3. Calculate the mean (μ) and standard deviation (σ) of the sampling distribution of the sample proportion (p-hat):
- The mean (μ) is given by:
μ = p = 0.92
- The standard deviation (σ) is calculated using the formula:
σ = sqrt[(p(1-p)/n)] = sqrt[(0.92 * 0.08) / 160] = sqrt[0.0464 / 160] = sqrt[0.00029] ≈ 0.017
4. Determine the threshold for more than 95% germination:
- More than 95% of 160 seeds means we need to find the probability that more than 152 seeds germinate (since 95% of 160 is 152).
5. Convert the count of germinated seeds to a proportion:
- The proportion corresponding to 152 seeds is:
p̂ = 152 / 160 = 0.95
6. Calculate the z-score for p̂ = 0.95:
z = (0.95 - 0.92) / 0.017 ≈ 1.76
7. Find the probability associated with the z-score:
- Using the standard normal distribution table or a calculator, we can find the probability for z = 1.76. This gives us the area to the left of the z-score.
- The area to the left of z = 1.76 is approximately 0.9608.
8. Calculate the probability of more than 95% germination:
- To find the probability of more than 95% germination, we take the complement:
- Probability = 1 - 0.9608 = 0.0392.
Therefore, the probability that more than 95% of the 160 seeds in the packet will germinate is approximately 0.0392, or 3.92%.
In conclusion, under the assumptions and conditions checked the likelihood of more than 95% of the seeds germinating is low.
The population parameter for proportions; the sample statistic for proportions
The population parameter for mean; the sample statistic for mean
A measure of center, given by ∑x/n
A measure of spread, given by sqr. (-x)^-2/n-1
1. What is p and p̂?
2. What is μ and ŷ?
3. What is a mean?
4. What is Standard Deviation?