Describing distributions
What is SOCS + context
S- Shape (symmetrical, unimodal, bimodal, skew...)
O- Outliers
C- Center (median, mean)
S- Spread (range, IQR, standard deviation)
Conclusion for an inference
What is because our p-value of __ is _(greater than/less than)_ the alpha of _(usually 0.05)_, we _reject/fail to reject_ Ho. We _(have/don't have)_ convincing evidence for Ha.
Random condition
What is allowing generalizations to the population of interest when taking random samples from a population
The question "in the last hour, how many times did you check your phone" is an example of this kind of question
What is quantitative
bonus if adds it is discrete
Combining distributions process
What is adding variances for finding the new standard deviation and adding means to find the new mean
LSRL regression
What is DUFS + context
D- direction (+/-)
U- unusual features (outliers)
F- form (linear/non-linear)
S- strength (weak/moderate/strong)
bonus if adds how to determine strength (looking at computer output)
r (correlation coefficient) 80% or greater = strong
50%< r <80% = moderate
r < 50% = weak
Slope
What is after each additional _(x variable)_ is added, the predicted _(y variable)_ increases by _(slope)_ _(units)_.
10% condition
What is shows independence when sampling without replacement when sample sizes
bonus if adds no need if random assignment is used
test-statistic calculation
What is test statistic= (statistic-parameter)/standard deviation
bonus if says all types of test statistics: z, t, x2
How to increase power and why
What is increasing sample size because it lowers the variability of the distributions, decreasing the overlap between the two distributions, therefore increasing the probability that the test statistic will be statistically significant, and increasing the significance level (alpha) since it raises the probability of rejecting the Ho.
bonus if they add what power is (the probability that Ha is true and we reject Ho)
Binomial distribution
What is BINS + context
B- binary (only have success or failure)
I- independent events
N- number of trials stays the same
S- same probability for each event
Standard deviation
What is the _(context)_ typically varies about _(standard deviation+units)_ from the mean of _(mean+units)_
Increasing sample size
What is decreasing variability
bonus if includes that it takes more resources and time
A resistant statistic
What are medians and IQRs
bonus if adds both or states that resistant statistics are not as affected by skew and outliers as compared to nonresistant statistics (mean and standard deviation)
Linearly transforming data process
What is addition/subtraction and multiplication/division affect the mean, only multiplication/division affects the standard deviation
bonus if adds that the shape stays the same
Test inference
What is PHANTOMS
P- Parameter
H- Hypothesis
A- Assess conditions
N- Name
T- Test statistic
O- Obtain p-value
M- Make a decision
S- State conclusion
bonus if adds HANTOMS for x2
p-value
What is assuming _(Ho in context)_ is true, there is a _(p-value)_ probability of getting a _(statistic in context)_ of _(statistic value)_ or _(more/less/more extreme)_
Large counts condition
What is allows us to assume the sampling distribution for our statistic is approximately normal
bonus if states the different types for each inference
proportions- np>10 & n(1-p)>10 (or equal to)
means- n>30 (or equal to)
x2- all expected>5 (or equal to)
What is to determine if the LSRL is a good fit based on if there is a leftover pattern or not (random scatter=LSRL is a good fit)
Unusual z-score
What is 3 or -3 standard deviations away
Conditions for LSRL inference
What is LINER
L- linear (scatterplot shows a linear relationship or the residual plot has no leftover pattern)
I- independent (10% condition)
N- Normal (residual dotplot has no strong skew or outliers)
E- equal standard deviations (the residual plot shows no sideways Christmas tree pattern)
R- Random (random samples or random assignment)
Confidence level
What is if we take many, many similar samples and calculate a confidence interval for each, about _(confidence level)_% of them will capture the true _(population parameter)_
CLT (Central Limit Theorem)
What is showing that the sampling distribution is approximately normal for inferences for means when all sample sizes are greater than or equal to 30
bonus if states no need to check if they give the population distribution, or if not met then if there is a dot plot with no outliers nor strong skew then also can assume sampling distribution is approximately normal
What is an example of cluster sampling
bonus if adds that it is more convenient and cost-effective as compared to random sampling
DAILY DOUBLE!!
The difference between the effect of a horizontal outlier and a vertical outlier on the LSRL
What is horizontal outliers tilt the LSRL line while vertical outliers shift the LSRL line up/down
bonus if adds that horizontal outliers are high leverage points or that outliers are influential points because if removed, there is a large change to slope, y-intercept and/or correlation