the beautiful thing that is stats Jeopardy Template

Basic Concepts

NHST

Assumptions

Random stuff

Regression

100

Which levels of measurement can be used with continuous variables?

Interval and ratio

100

What are the 5 steps of the NHST?

1. state null and research hypotheses

2. Set the alpha level

3. Identify sample characteristics

4. Find location of sample mean within null sampling distribution

5. Reject/retain the null by comparing p to alpha or z to critical value

100

How do we check for the normality assumption in grouped and ungrouped data?

Grouped: satisfied if N>30 or population is normal (CLT)

Ungrouped: use skewness/kurtosis, p-p/q-q plots

100

What is the z-score when we are 1, 2, and 3 SDs from the mean?

1 SD = 1

2 SD = 1.96

3 SD = 2.58

100

What is the unstandardized regression coefficient?

The change in Y when there is a one-point change in X

200

What is a variance, and how is it related to standard deviation?

How different scores are from each other. Variance is SD squared

200

Explain the confusion matrix.

If we falsely reject the null, type 1 error.

If we falsely retain the null, type 2 error.

If we reject the null when it's not true, this is "power"

200

How do we test for the equal variance assumption?

Levene's test, but if that's too sensitive, then Fmax (measures the ratio between the largest and smallest variances, should be less than 10:1)

200

How do we treat outliers? How about dropped cases?

Outliers: drop the case, Winsorizing, transform data (take sq.rt. of all scores in a variable)

Dropped cases: person-mean substitution, case-wise deletion

200

What 4 requirements satisfy a causal inference (extra points if you define all of them)?

Temporal precedence - cause comes before effect

Correlation/Co-occurrence - cause and effect happen together

Non-spuriousness - there aren't other causes explaining the observed relationship

Causal mechanism - process the transmits influence of cause to effect (aka a rational explanation for the observed relationship)

300

What are 4 ways we can increase power in a sample?

Increase the sample size, increase size of the effect, use a within-groups design, increase alpha

300

What are the differences between t- and z-tests?

T-tests: we use degrees of freedom instead of SD for the population group (since SD is unknown); critical values aren't rigid because we don't know the population parameters; sample is smaller (<30) because it's designed to be robust when it doesn't meet the threshold of the CLT; uses sample SD/variance to calculate test statistic

Z-tests: Population SD is known, distribution is normal, large sample (>30), CVs are rigid; uses normal distribution to calculate test statistic

300

What is the independence of errors assumption, and what does it tell us?

It assumes that all data points are decided independently and are not related to each other. If scores are redundant, we know there's a research design issue (multiple submissions, people discussing responses, etc)

300

How are one-sample and paired t-tests alike and different?

One-sample: looks at whether a sample mean is different from a hypothesized value when the population SD is unknown (so we use the sample SD instead); comparing experimental group to control

Paired sample: 2 sets of paired scores converted into 1 set of difference scores --> can then compare those difference scores to a reference score (usually 0); we're basically turning it into a one-sample t-test

300

What is the difference between regression and correlation?

Correlation - summarizes the relationship between 2 variables, but not how or why it happens; descriptive

Regression - use the IV to predict the DV; attempts to predict/explain the relationship between variables; is able to show cause and effect (variables are causally related)

400

When this value gets smaller, we have more chances to reject the null hypothesis (hint: it's not the p-value. well it is, but it's not the answer here)

critical value

400

Explain critical value vs confidence interval.

Critical values are values that define the threshold where the test statistic is unlikely to be true (so outside of the CV, we can reject the null).

Confidence intervals are critical values in raw score form. They define regions in which the population parameter is likely to be in. So, a 95% confidence interval is basically saying that there is a 95% chance that the confidence interval will contain the parameter of interest.

400

What's the difference between homogeneity of variance and homoscedasticity?

Homogeneity of variance deals with grouped data, homoscedasticity with ungrouped. they both assume that points are randomly selected from he same population, so the variance across groups/individual data points should also be the same

400

What is effect coding?

The mean difference between the grand mean (intercept) and a specific group. It's useful when we don't have a specific hypothesis or groups have different sample sizes.

400

Why do we use the GLM?

Used to look at the relationship between 1+ IV(s) and a DV. Provides a foundation for understanding and applying different statistical tests, can handle categorical and continuous IVs, and allows for predicting outcomes based on an identified relationship between variables.

500

What are the 5 tenets of the Central Limit Theorem (CLT)?

1. Sampling distribution will be normally distributed regardless of raw score distribution

2. As N increases (>30), samp.dist is near perfectly normal

3. If the raw score distribution is normally distributed, so will the sampling distribution, regardless of N

4. Mean of samp.dist = Population mean

5. SD of samp.dist < SD of pop

500

Tell me literally everything you know about the p-value (definition and relation to effect size).

It's the probability of having the sample statistic when only the chance factor is at play (aka the probability of getting a result as extreme as yours under the null hypothesis). It is NOT an indicator of effect size, although it does contribute to it. It tells us nothing about the probability of obtaining the sample mean in a non-random world where there's a real hypothesized effect. A lower p-value means that there is a low probability of getting the sample statistic in a random world.

500

What are the 6 assumptions of multiple regression?

Linearity, homoscedasticity, normality of residuals, independence of errors, no multicollinearity, no measurement error

500

Why do we center continuous variables when coding?

It helps with multicollinearity (inflation of coefficients) and puts the results in the range of the data which makes it easier to interpret in the context of the data (range restriction).

500

Why do only positive values exist in a one-way ANOVA?

For one-way ANOVAs, we use the variance instead of the SD. Since the variance is SD squared, only positive values may exist in the distribution and it is positively skewed.