The science and art of collecting, analyzing, and drawing conclusions from data
The science and art of collecting, analyzing, and drawing conclusions from data
Statistics
In a __________ experiment, either the subjects don't know which treatment they are receiving or the people who interact with them and measure the response variable don't know which subjects are receiving the treatment.
In a single-blind experiment, either the subjects don't know which treatment they are receiving or the people who interact with them and measure the response variable don't know which subjects are receiving the treatment.
The long-run relative frequency of an outcome after many repetitions of a chance process is its ___________
The long-run relative frequency of an outcome after many repetitions of a chance process is its probability.
When constructing confidence intervals, the _________ is a statistic that provides an estimate of a population parameter.
When constructing confidence intervals, the point estimator is a statistic that provides an estimate of a population parameter.
_______ _______ _______ ________ 4 steps makes inference much faster.
State Plan Do Conclude 4 steps makes inference much faster.
A ___________ assigns labels that place each individual into a particular group, called a category. A ___________ takes number values that are quantities— counts or measurements
A categorical variable assigns labels that place each individual into a particular group, called a category. A quantitative variable takes number values that are quantities— counts or measurements
In an experiment, a _________ is used to provide a baseline for comparing the effects of other treatments.
In an experiment, a control group is used to provide a baseline for comparing the effects of other treatments.
What is the addition rule for mutually exclusive events?
P(A or B) = P(A) + P(B)
A ________ is a multiplier that makes the interval wide enough to have the stated capture rate.
A critical value is a multiplier that makes the interval wide enough to have the stated capture rate.
"Because the p-value of 0.007 is less than the significance level of 0.05, we have convincing evidence that the true proportion of students from Basis who get a 5 on their AP test is greater than 0.8."
Assume the above statement is correct. Explain why it's incomplete.
"Because the p-value of 0.007 is less than the significance level of 0.05, we have convincing evidence that the true proportion of students from Basis who get a 5 on their AP test is greater than 0.8."
Assume the above statement is correct. Explain why it's incomplete.
Doesn't explicitly say we are rejecting the null hypothesis.
The _______ of a variable tells us what values the variable takes and how often it takes those values.
The distribution of a variable tells us what values the variable takes and how often it takes those values.
Explain the concept of confounding and how it limits the ability to make cause-and-effect conclusions.
Explain the concept of confounding and how it limits the ability to make cause-and-effect conclusions.
A confounding variable is a hidden/unaccounted-for variable that influences both explanatory and response variables.
Because it influences both variables, it creates a misleading relationship between them.
What is the general addition rule?
P(A or B) + P(A) + P(B) - P(A and B)
If we want to reduce our margin of error, we must _________________ or _______________
If we want to reduce our margin of error, we must decrease confidence level or increase sample size.
What is the difference between calculation of the Standard Error for confidence intervals for proportions and significance tests for proportions?
What is the difference between calculation of the Standard Error for confidence intervals for proportions and significance tests for proportions?
CI: use p-hat
significance test: use null p
If knowing the value of one variable helps us predict the value of another variable, we say there is ________ between the two variables.
If knowing the value of one variable helps us predict the value of another variable, we say there is (an) association between the two variables.
Florence felt sad there was no Statistics homework on Thanksgiving, so she decided to collect data on the number of hours students sleep the night before an exam and their scores on that exam. Her data resulted in a LSRL with a slope of 3.8 and y-intercept of 52.4. Interpret the slope and y-intercept in context.
Florence felt sad there was no Statistics homework on Thanksgiving, so she decided to collect data on the number of hours students sleep the night before an exam and their scores on that exam. Her data resulted in a LSRL with a slope of 3.8 and y-intercept of 52.4. Interpret the slope and y-intercept in context.
slope: For each additional hour of sleep, the model predicts the exam score will increase by 3.8 points on average
y-intercept: A student who studies 0 hours is predicted to get a score of 52.4 on average.
What conditions must be met before we treat something as a binomial random variable?
What conditions must be met before we treat something as a binomial random variable?
Binary (2 outcomes)
Independent (trial outcomes don't effect each other)
Number of trials fixed
Same probability each trial
Phoebe noticed one of her classmates practicing interpreting confidence intervals. Her classmate constructed a 95% confidence interval for the mean time (in minutes) it takes students to complete a puzzle, which was calculated to be (12.4, 15.8). They interpreted this as "There is a 95% chance that the true mean time is between 12.4 and 15.8 minutes". Phoebe warned them that this is incorrect. Why?
Phoebe noticed one of her classmates practicing interpreting confidence intervals. Her classmate constructed a 95% confidence interval for the mean time (in minutes) it takes students to complete a puzzle, which was calculated to be (12.4, 15.8). They interpreted this as "There is a 95% chance that the true mean time is between 12.4 and 15.8 minutes". Phoebe warned them that this is incorrect. Why?
The true population mean is a fixed value - it's either in the interval or it's not. The probability is not about the parameter, it's about the method used to generate the interval.
Before going home, Diego gives the following warning to all his classmates: "Beware of multiple analyses!"
Why did he do this?
Before going home, Diego gives the following warning to all his classmates: "Beware of multiple analyses!"
Why did he do this?
Performing the same significance test multiple times results in a higher chance of getting a false positive by random chance.During the holiday, Tommy became very curious about the study habits of his classmates. He surveyed 150 students in the school about whether they preferred studying alone or with others. He also asked them if they consider themselves introverted or extroverted.
Study Alone | Study with Others| Total
Introverted 54 21 75
Extroverted 18 57 75
Total 72 78 150
Is there association between personality type and study preference? Justify your answer.
During the holiday, Tommy became very curious about the study habits of his classmates. He surveyed 150 students in the school about whether they preferred studying alone or with others. He also asked them if they consider themselves introverted or extroverted.
Study Alone | Study with Others| Total
Introverted 54 21 75
Extroverted 18 57 75
Total 72 78 150
Is there association between personality type and study preference? Justify your answer.
Yes, strong association (support your answer comparing any conditional probabilities)
Andy felt determined to practice his Statistics skills outside of class by modeling the relationship between the mass of an animal (kg) and its metabolic rate (watts). His model predicted the metabolic rate from mass, with slope of 6.45, y-intercept of 18.2, s of 12.7, and r2 of 0.82.
a) write the equation of the LSRL for Andy's model
b) interpret r2
Andy felt determined to practice his Statistics skills outside of class by modeling the relationship between the mass of an animal (kg) and its metabolic rate (watts). His model predicted the metabolic rate from mass, with slope of 6.45, y-intercept of 18.2, s of 12.7, and r2 of 0.82.
a) write the equation of the LSRL
y-hat = 18.2 + 6.45x, where y-hat = predicted metabolic rate and x = mass (in kg)
b) interpret r2
About 82% of the variation in metabolic rate among these species is explained by the linear relationship with mass.
Jerry is using his knowledge of statistics and business to start a light bulb company. 40% of his light bulbs last more than 1,000 hours. His company is new, so they've only produced 200 light bulbs. Yinching wants to take a sample of 30 randomly selected light bulbs and model the number of long-lasting bulbs in the sample as a binomial random variable. But Jerry warns Yinching that this is not a good use of statistics. Why?
Jerry is using his knowledge of statistics and business to start a light bulb company. 40% of his light bulbs last more than 1,000 hours. His company is new, so they've only produced 200 light bulbs. Yinching wants to take a sample of 30 randomly selected light bulbs and model the number of long-lasting bulbs in the sample as a binomial random variable. But Jerry warns Yinching that this is not a good use of statistics. Why?
Sampling without replacement where n (30) is not less than or equal to 1/10 N (200) violates the Independent condition from BINS.
Anson decides to poll students at Basis to see how many believe AP Statistics should be a required class for graduating. He wants to create a 95% confidence interval with a margin of error no bigger than 4%. What is the minimum sample size Anson needs to achieve his goal?
Anson decides to poll students at Basis to see how many believe AP Statistics should be a required class for graduating. He wants to create a 95% confidence interval with a margin of error no bigger than 4%. What is the minimum sample size Anson needs to achieve his goal?
0.04 = 1.96*sqrt([0.5*0.5] / n)
0.0016 = 3.8416(0.25/n)
n = (3.8416 * 0.25) / 0.0016
n = 600.25 -> round up -> n = 601 students
Karina is creating a company designed to monitor students' biometric signs so that their teachers always know exactly how much time they spent studying.
Before the company launches, her null hypothesis is that mean time (mins) studying Statistics per night equals 60, and the alternative hypothesis is that the mean time is less than 60. She chooses a significance level of 0.05.
a) Describe a Type I error in context
b) Describe a Type II error in context
Karina is creating a company designed to monitor students' biometric signs so that their teachers always know exactly how much time they spent studying.
Before the company launches, her null hypothesis is that mean time (mins) studying Statistics per night equals 60, and the alternative hypothesis is that the mean time is less than 60. She chooses a significance level of 0.05.
a) Describe a Type I error in context
Concluding mean study time for Stats is less than 60 mins/night when it's actually 60 minutes
b) Describe a Type II error in context
Concluding mean study time for Stats is 60 minutes when it's actually less than 60 mins.