How to describe a distribution
SOCS + context?
Shape (Symmetry)
Center (Mean/Median)
Spread (Range, Standard Deviation, IQR)
Outliers (Are there outliers?)
Place all the above in context.
Convenience Sample
Voluntary Response Sample
Simple Random Sample
Convenience Sample: Simply selecting people that are available at the time.
Voluntary Response Sample: Individuals choose to be involved in the study.
Simple Random Sample: Every member of the population has an equal chance of being selected.
What are the 3 types of visuals you can use to help with probability?
Tree diagram, Venn diagram, Tables
What is P.A.N.I.C.?
P- parameter statement (sentence telling reader what you are trying to find)
A-assumptions and conditions
N-name the interval
I-interval (do the math)
C- conclusion
What is Type 1 error? Type 2 error?
and what is the probability of Type 1 error?
A Type 2 error is when the alternative hypothesis is true, but you fail to reject the null hypothesis.
A high school statistics teacher recorded her student's final exam grades (measured in points) for fun. Describe the distribution.
The distribution of the student's exam grades is fairly symmetrical. The median student exam grade is between 70 and 80 points. The range of the student exam grades is 60 points and there seems to be no outliers.
Undercoverage bias
Nonresponse bias
Response bias
Undercoverage bias is when people cannot respond.
Nonresponse bias is when people do not respond.
Response bias is when people are untruthful about their responses.
Describe the following rules:
Complement Rule
Addition Rule
Multiplication Rule
Complement Rule: The probability of an event occurring is 1 minus the probability that it doesn't occur.
Addition Rule: For 2 disjoint events A & B, the probability that one or the other occurs is the sum of the probabilities of the 2 events.
Multiplication Rule: For 2 independent events A & B, the probability that both A & B occur is the product of the 2 events.
What is the parameter, statistic, shape, center, variability, and z-score formula for a sample proportion?
parameter - p - population proportion
statistic - p̂ "p hat" - sample proportion
shape - approximately normal if passes success/failure condition (np and nq are greater than or equal to 10)
center - μ p hat = P
variability - σ p hat = √ (pq/n)
z score formula - (p hat - p)/ (pq/n)
How do you decide between a 2-sample for difference of means or 1-sample mean of differences?
Ask yourself: "Does it make sense to find the difference for the original data?"
What six values can you determine from a boxplot?
The minimum value, the first quartile, the median, the third quartile, the maximum, and the IQR (interquartile range)
What are the elements of an experiment?
An experiment requires...
random assignment of subjects
must identify one explanatory factor to manipulate
must identify response variable to measure
must identify subjects
AND MOST IMPORTANTLY include a treatment
What are the conditions for a geometric distribution?
Binary (there is only success and failure)
Independent
First Success (It ends with the first success "until he lands it")
Same Probability of Success in Each Trial
What is the parameter, statistic, shape, center, variability, and z-score formula for a sample mean?
parameter - μ - population mean
statistic - x̄ - sample mean
shape - approximately normal if...
(1) population is approx. normal
(2) n is greater than 30 (by the Central Limit Theorem)
(3) sample shows no strong skew
center - μx̄ = μ
variability - σx = σ/√ n
z score formula - (x̄ - μ)/(σ/√ n)
What is the key wording for you to know to conduct a significance test?
"Do the data provide convincing evidence?" Significance tests that ask if there is a 'difference' in parameters are two-sided tests.
How to describe a distribution (pt.2)
DUFS + context
Direction (positive/negative)
Unusual Features (outliers or clusters)
Form (linear or nonlinear)
Strength (weak, moderate, strong)
What is a confounding variable?
A variable that influences both the dependent variable and independent variable, causing a false association
How are the mean, standard deviation, and variance affected by multiplying/dividing? adding/subtracting?
A polling agency showed the following two statement to a random of 1048 adults in the US.
Environmental statement: Protection of the environment should be given priority over economic growth.
Economy Statement: Economic growth should be given priority over protection of the environment.
The order in which the statements were shown was randomly selected for each person in the sample. After reading the statements, each person was asked to choose a statement that was most consistent with their opinion. The results are shown below:
Environmental Sample - 58%
Economy Sample - 37%
No Preference-5%
Assume the conditions for inference have been met. Construct and interpret a 95% confidence interval for the proportion of all adults who would have chosen the economy statement.
I will construct a one sample z interval for p.
p=the true proportion of US adults who would choose the economy statement
population: all US adults
show your work, but it should be set up like this
0.37 +/- 1.96 √ (0.37 x 0.63)/1048
=(0.3408, 0.3992)
I am 95% confident that the true proportion of US adults that would have chosen the economy statement is between 34.08% and 39.92%
Tumbleweed, commonly found in the western United States, is the dried structure of certain plants that are blown by the wind. Kochia, a type of plant that turns into tumbleweed at the end of summer, is a problem for farmers because it takes nutrients away from soil that would otherwise go to more beneficial plants. Scientists are concerned that kochia plants are becoming resistant to the commonly used herbicide, glyphosate. In 2014, 19.7 percent of 61 randomly selected kochia plants were resistant to glyphosate. In 2017, 38.5 percent of 52 randomly selected kochia plants were resistant to glyphosate. Do the data provide convincing statistical evidence, at alpha level 0.05, that there has been an increase in the proportion of all kochia plants that are resistant to glyphosate?
Because the p-value is less than α = 0.05, there is convincing statistical evidence to conclude that the proportion of resistant plants in the 2017 population of kochia plants is greater than the proportion of resistant plants in the 2014 population of kochia plants.
A scientist wanted to compare viper length and weight to see if there was a relationship. Describe the distribution.
There is a strong, positive, linear relationship between viper length and viper weight with no outliers. This distribution seems to indicate that as viper length increases, the viper weight also increases.
Describe the difference(s) between a stratified and cluster sampling method
A stratified sampling method contains homogeneous groups and conducts SRS within each of the groups. A cluster sampling method contains heterogeneous groups and conducts SRS within each of the groups.
The number of daily views of a TikTok video follows an approximately normal distribution with a mean of 15,000 and standard deviation of 4,000.
(a) Find the probability that a randomly selected day has more than 25,000 views.
(b) How many views would be in the bottom 5% of all days?
(a) 0.0062096
Draw a normal curve and label it. Find the z-score by... (25000-15000)/4000=2.5 and then using your calculator or table find the area. The area is 0.993790 but remember to subtract it from 1.
(b) 8420 views
Draw a normal curve and label it. Use the z-score formula, but input the given values.
-1.645= (x-15000)/4000
Solve for x
Patients with heart-attack symptoms arrive at an emergency room either by ambulance or self-transportation provided by themselves, family, or friends. When a patient arrives at the emergency room, the time of arrival is recorded. The time when the patient's diagnostic treatment is also recorded.
An administrator of a large hospital wanted to determine whether the mean wait time (time between arrival and diagnostic treatment) for patients with heart-attack symptoms differs according to the mode of transportation. A random sample of 150 with heart-attack symptoms who had reported to the emergency room was selected. For each patient, the mode of the transportation and wait time were recorded. Summary statistics for each mode of transportation are shown in the table below.
Use a 99% confidence interval to estimate the difference between the mean wait times for ambulance-transported patients and self-transported patients at this emergency room.
I will conduct a 2 sample t interval for μA-μS, the mean difference in mean waiting times.
Random sample: given in question
Independence: assume
Nearly Normal: sample sizes for both are greater than 30 (Central Limit Theorem)
10% Condition - 77 and 73 patients are less than 10% of all patients
Do the math. It should look like this:
We are 99% confident that the true difference in the populations' mean waiting times (ambulance - self) is between -4.3177 minutes and -0.2023 minutes. You can also say that the true mean wait time for those who arrive by ambulance is shorter than those who are self-reported by somewhere between 0.2 and 4.3 minutes.