True or False? Association measures the direction and strength of the linear relationship between two variables.
False; this is the definition of correlation.
What is the importance of random sampling?
It allows us to generalize the data we collect from the sample to the population we are trying to understand.
What are three diagrams one can draw to help visualize a probability problem?
Answers may vary: Venn diagram, Tree diagram, and two-way tables.
The p-value is the area to the ______ of the χ^2 under the chi-square density curve.
Right
A student is interested in determining if the true proportion of teachers that work in their school who have social media is higher than 0.3. They gathered an SRS of 20 teachers in their school and asked them each if they have social media or not.
one sample z-test for population proportion
In the equation ŷ = 1 + 1.9x, is it appropriate to find the predicted y-value with an x-value of 35? Below is a table of observed values.
x = | 1 | 2 | 3 | 4 |
ŷ = |1.9 | 3.8 | 5.7 | 7.6 |
What is the difference between a cluster random sample and a stratified random sample?
Cluster random sample: Done by grouping a population based off of heterogeneous traits and randomly selecting certain clusters to be apart of the sample
Stratified random sample: Done by grouping individuals of a population into even groups called stratas based off of homogeneous traits and randomly selecting an even amount of subjects from each strata to be apart of the sample
Choose whether the following scenario would be binomial, geometric, or neither.
A high school principal goes to 10 different classrooms and randomly selects one student from each class. X = the number of female students in his group of 10 students.
Neither because the probability varies per classroom.
A hairdresser wants to see if more people are coloring their hair this year compared to last year. 200 people were observed this year and last year to see if they got their hair colored or not. Which chi-square test would be most appropriate?
Goodness of Fit
A bake shop owner is interested in finding out whether the average number of donut sales every day is the same as it was last year. The current average of sales in the bake shop is about 30 sales a day. She takes numerous SRS’s of 40 random days and constructs a test at the significance level of 5%.
One sample t-test for population mean
A scatterplot indicates the relationship between the size of a diamond (in carats) and its price ( in US dollars ). Its equation is ŷ = 0 + 1300x, where x = the size in carats and ŷ = predicted price in dollars. How would we interpret the residual value 0.9 when the size of a diamond is 0.1 carats.
The actual price in dollars was 0.9 dollars higher than the predicted value for a 0.1 carat diamond.
A scientist wants to collect a simple random sample of individuals from the population he is studying, which is everyone in New York. He is interested in finding the average amount of songs people in New York have in their playlist. To do this, he interviews every 3rd person entering New York Times Square and asks them how many songs they have in their playlist. Is the sample biased?
Yes, because this is considered convenience sampling because he is only interviewing people who enter, even if it is every 3rd person.
Describe the difference between mutually exclusive and independent.
Mutually exclusive events cannot happen at the same time, while independent events do not affect each other's probability.
What is the template for writing the hypotheses for a chi-square test for homogeneity?
H0: There is NO difference in the distributions of categorical variable across populations/treatments.
Ha: There is a difference in the distributions of categorical variable across populations/treatments.
A scientist takes an SRS of 60 staff members in company A and asks, on average, how many calories they eat a day. He does the same to another random sample of 60 staff members in a different company, company B. He suspects that the average number of calories eaten in a day is higher in company A than in company B. He decides to perform a test to investigate his hypothesis.
Two sample t-test for difference in means
Which point would be considered an outlier, and what type of outlier would it be/how would it affect the LSRL?
x | 1 | 2 | 3 | 17 |
ŷ | 5 | 10 | 15 | 24 |
The point (17, 24) would be a high leverage point because it has an extreme x-value compared to the other points. It would affect the LSRL by decreasing the slope and r-value and potentially increasing the y-intercept.
A student at Walpole High school is interested in figuring out the proportion of people in Walpole who prefer meat over vegetarian options. To do this, she will perform an experiment with a simple random sample of 30 residents in Walpole to ask which they prefer. She gathers this sample by going to every grocery store in Walpole and interviewing every 3rd person she sees in the produce section until she gathers 30 responses. What bias is this?
Undercoverage bias because the student fails to equally represent residents in walpole who prefer meat because they would likely not be near the produce section of the grocery store. This overestimates the proportion of walpole residents that prefer vegetarian options over meat.
The probability that a visitor goes to the zoo and sees the giraffes is 0.74, and the probability that they see the lions is 0.63. The probability that they see either the giraffes or the lions is 0.95. What is the probability that they see both animals?
0.74 + 0.63 = 1.37
1.37 - 0.95 = .42
Two samples from a large highschool were taken to investigate whether grade level and taking statistics are independent. What are the expected counts for this test?
Jr. Sr.
Takes Statistics: 180 300
Doesn't take Stats: 20 100
160 320
40 80
Does a certain medication reduce the risks of heart attacks? 450 patients at a medical center volunteered for a study to find out. Researchers randomly assigned 320 volunteers to a treatment group and the remaining 130 to a control group. Patients in the treatment group were given the new medication, those in the control group were kept on their regular medications. In the 2-year period following the start of the study, 160 of the people in the treatment group had a heart attack, compared to 110 of the people in the control.
Do these results provide convincing evidence at the a = 0.05 level that the new medication reduces the proportion of patients who have heart attacks within 2 years for people similar to the ones in the study?
Two-sample z-test for difference in proportions
A random sample of 12 dandelions are gathered to study the relationship between how much light they receive ( in light meters ) and their height ( in cm ). Suppose they gather various other flowers besides dandelions, how would this affect the r2 value?
It would decrease the r2 value because there would be more variability from the different flowers.
A scientist collected an SRS of 20 students in a high school, interested in finding out how much sleep and what foods teenagers eat impacts their GPA. he randomly assigns every student in the sample either 6, 8, or 10 hours of sleep for 2 weeks, and then randomly assigns them to eat either chicken, pizza or pasta for dinner for those 2 weeks as well. What are the explanatory variables and how many are there?
There are 9 treatments imposed and the explanatory variables are how many hours of sleep they get ( 6, 8, 10 ) and what they eat for dinner ( chicken, pizza, pasta ).
Sarah, a professional pet groomer, has a 70% chance of getting a dog as a walk-in on any day. One day she has 10 walk-ins. Assume that each walk-in is independent.
Find is the probability that at least four of her walk-ins are dogs.
P(X greater than or equal to 4) = 1-P(X less than or equal to 3) = 1 - binomcdf(trials:10, p=.7, X value = 3).
1 - 0.0106 = 0.9894
A hypothesis test was conducted to see whether there is an association between a person’s income level and his or her education level. A random sample of 225 people was selected, and the appropriate hypothesis test was conducted. The chi-square test statistic and corresponding p-value were approximately 13.36 and 0.01, respectively. Interpret the p-value in context.
Assuming that a person’s income level and education level are independent, there is a 1 percent chance of finding a test statistic of 13.36 or greater.
A scientist takes a random sample of 40 students in America and 40 students in the UK and asks each student whether they like school or not. He constructs a confidence interval at 90% confidence to determine what the true difference in population proportions is for students in America and in the UK.
Two sample z-interval for difference in population proportions