Which levels of measurement can be used with continuous variables?
Interval and ratio
What are the 5 steps of the NHST?
1. state null and research hypotheses
2. Set the alpha level
3. Identify sample characteristics
4. Find location of sample mean within null sampling distribution
5. Reject/retain the null by comparing p to alpha or z to critical value
How do we check for the normality assumption in grouped and ungrouped data?
Grouped: satisfied if N>30 or population is normal (CLT)
Ungrouped: use skewness/kurtosis, p-p/q-q plots
What is the z-score when we are 1, 2, and 3 SDs from the mean?
1 SD = 1
2 SD = 1.96
3 SD = 2.58
What is the unstandardized regression coefficient?
The change in Y when there is a one-point change in X
What is a variance, and how is it related to standard deviation?
How different scores are from each other. Variance is SD squared
Explain the confusion matrix.
If we falsely reject the null, type 1 error.
If we falsely retain the null, type 2 error.
If we reject the null when it's not true, this is "power"
How do we test for the equal variance assumption?
Levene's test, but if that's too sensitive, then Fmax (measures the ratio between the largest and smallest variances, should be less than 10:1)
How do we treat outliers? How about dropped cases?
Outliers: drop the case, Winsorizing, transform data (take sq.rt. of all scores in a variable)
Dropped cases: person-mean substitution, case-wise deletion
What 4 requirements satisfy a causal inference (extra points if you define all of them)?
Temporal precedence - cause comes before effect
Correlation/Co-occurrence - cause and effect happen together
Non-spuriousness - there aren't other causes explaining the observed relationship
Causal mechanism - process the transmits influence of cause to effect (aka a rational explanation for the observed relationship)
What are 4 ways we can increase power in a sample?
Increase the sample size, increase size of the effect, use a within-groups design, increase alpha
What are the differences between t- and z-tests?
T-tests: we use degrees of freedom instead of SD for the population group (since SD is unknown); critical values aren't rigid because we don't know the population parameters; sample is smaller (<30) because it's designed to be robust when it doesn't meet the threshold of the CLT; uses sample SD/variance to calculate test statistic
Z-tests: Population SD is known, distribution is normal, large sample (>30), CVs are rigid; uses normal distribution to calculate test statistic
What is the independence of errors assumption, and what does it tell us?
It assumes that all data points are decided independently and are not related to each other. If scores are redundant, we know there's a research design issue (multiple submissions, people discussing responses, etc)
How are one-sample and paired t-tests alike and different?
One-sample: looks at whether a sample mean is different from a hypothesized value when the population SD is unknown (so we use the sample SD instead); comparing experimental group to control
Paired sample: 2 sets of paired scores converted into 1 set of difference scores --> can then compare those difference scores to a reference score (usually 0); we're basically turning it into a one-sample t-test
What is the difference between regression and correlation?
Correlation - summarizes the relationship between 2 variables, but not how or why it happens; descriptive
Regression - use the IV to predict the DV; attempts to predict/explain the relationship between variables; is able to show cause and effect (variables are causally related)
When this value gets smaller, we have more chances to reject the null hypothesis (hint: it's not the p-value. well it is, but it's not the answer here)
critical value
Critical values are values that define the threshold where the test statistic is unlikely to be true (so outside of the CV, we can reject the null).
Confidence intervals are critical values in raw score form. They define regions in which the population parameter is likely to be in. So, a 95% confidence interval is basically saying that there is a 95% chance that the confidence interval will contain the parameter of interest.
What's the difference between homogeneity of variance and homoscedasticity?
Homogeneity of variance deals with grouped data, homoscedasticity with ungrouped. they both assume that points are randomly selected from he same population, so the variance across groups/individual data points should also be the same
What is effect coding?
The mean difference between the grand mean (intercept) and a specific group. It's useful when we don't have a specific hypothesis or groups have different sample sizes.
Why do we use the GLM?
Used to look at the relationship between 1+ IV(s) and a DV. Provides a foundation for understanding and applying different statistical tests, can handle categorical and continuous IVs, and allows for predicting outcomes based on an identified relationship between variables.
What are the 5 tenets of the Central Limit Theorem (CLT)?
1. Sampling distribution will be normally distributed regardless of raw score distribution
2. As N increases (>30), samp.dist is near perfectly normal
3. If the raw score distribution is normally distributed, so will the sampling distribution, regardless of N
4. Mean of samp.dist = Population mean
5. SD of samp.dist < SD of pop
Tell me literally everything you know about the p-value (definition and relation to effect size).
It's the probability of having the sample statistic when only the chance factor is at play (aka the probability of getting a result as extreme as yours under the null hypothesis). It is NOT an indicator of effect size, although it does contribute to it. It tells us nothing about the probability of obtaining the sample mean in a non-random world where there's a real hypothesized effect. A lower p-value means that there is a low probability of getting the sample statistic in a random world.
What are the 6 assumptions of multiple regression?
Linearity, homoscedasticity, normality of residuals, independence of errors, no multicollinearity, no measurement error
Why do we center continuous variables when coding?
It helps with multicollinearity (inflation of coefficients) and puts the results in the range of the data which makes it easier to interpret in the context of the data (range restriction).
Why do only positive values exist in a one-way ANOVA?
For one-way ANOVAs, we use the variance instead of the SD. Since the variance is SD squared, only positive values may exist in the distribution and it is positively skewed.