Statistics Jeopardy!

General

Chi-squared

ANOVA

MLR & Dummy Var.

100

What does it mean when a variable has a bar on top? (y-bar or x-bar) What does it mean when a variable has a hat on top? (y-hat or ß-hats)

A bar on top means that variable is the average (y-bar = average of the values in the dependent variable, x-bar = average of the values in the independent variables) A hat on top means that variable is a predicted or sample value (y-hat = predicted value of the dependent variable, ß1-hat = sample slope or ß0-hat = sample y-intercept)

100

State the assumptions for Chi-squared

1. Random sample of… (IN CONTEXT!) 2. All expected frequencies are at least 5

100

State assumptions for ANOVA.

1. Random and independent samples of ... (IN CONTEXT!) 2. Probability distribution of the values of all variables is approximately normal (IN CONTEXT) 3. Equal population variances

100

How do you set up your linear regression equation?

From Minitab: y-hat = ß0-hat + ß1-hat*x1 + ß2-hat*x2 + ß3-hat*x3 + ... + ßn-hat*xn + errors ß-hats should be numbers from Minitab printout DO NOT FORGET HAT or ERRORS!

100

How many dummy or indicator variables should you have in your equation?

Number of dummy variables you need is always one less than number of qualitative variables that you have

200

What is r? What is r-squared?

correlation coefficient, and coefficient of determination respectively.

200

What is the degrees of freedom formula for Chi-squared?

(# of rows – 1) * (# of columns – 1)

200

Why would you use ANOVA?

When you have multiple populations and want to test for a difference in at least two of their means: H0: µ1 = µ2 = µ3 = ... = µn Ha: at least two population means differ

200

State assumptions for linear regression.

1. The probability distribution of the errors associated with all variables is approximately normal (IN CONTEXT) 2. The expected value of the errors is zero 3. The standard deviation of the errors is constant 4. The errors are random and independent

200

What is the significance of the residual plots?

Each plots attempts to verify a linear regression assumption. Normal Probability Plot and Histogram attempt to verify the assumption that the probability distribution OF THE ERRORS is approximately normal. The points in the NPP should cluster around a line and not have a discernible "s" curve. The histogram should resemble a bell curve. The Versus Fits Plot attempts to verify the assumption of constant standard deviation of the errors. The distances in all directions (not only vertical) between points should be equal. The Versus Order Plot attempts to verify the assumption that the ERRORS are random and independent. The points should be scattered and there should be no discernible pattern. The assumption that the expected value of the errors equals zero is not verifiable through these residual plots.

300

What do you do when you have a p-value for a one tailed test, but now want to perform a two-tailed test? OR What do you do when you have a P-value for a two tailed test, but now want to perform a one-tailed test?

You double or half the p-value respectively. Do not alter the significance level.

300

What is the formula for expected frequency?

((row total) * (column total)) / sample size

300

What is the significance of residual plots?

Each plots attempts to verify an assumption. Normal Probability Plot and Histogram attempt to verify the assumption that the probability distribution is approximately normal. The points in the NPP should cluster around a line and not have a discernible "s" curve. The histogram should resemble a bell curve. The Versus Fits Plot attempts to verify the assumption of equal population variance. The vertical distances between points should be equal. The Versus Order Plot attempts to verify the assumption that the samples are random and independent. The points should be scattered and there should be no discernible pattern.

300

Interpret r-squared = 79%.

79% of the change in the dependent variable is explained by the change in the independent variable(s). 21% of the change is unexplained by this linear model. This linear model seems somewhat useful for the prediction of the dependent variable

300

Define Collinearity

It occurs when the independent variables are highly linearly correlated and can inflate t-test p-values in MLR.

400

What can never equal zero because the normal curve is infinite?

P-value.

400

Identify the type of probability when you want to find the probability of A given B?

Conditional probability. Cell value / row OR column total

400

What is Tukey's analysis?

Tukey's analysis allows us to see between which population averages the differences (if any) lie. It takes populations 2 at a time, and generates simultaneous confidence intervals estimating the differences in their means. It gives us more specific information than ANOVA.

400

State the methods of evaluating the usefulness of a linear model.

1. s vs. y-bar (sample standard deviation of the errors should be low relative to the average of the dependent variable) 2. r-squared coefficient of determination (should be close to 100%) 3. Hypothesis test (small p-value) 4. General assessment.

400

Define Heteroscedasticity.

It is the term for a highly skewed Versus fits plot, and a violation of the assumption of constant standard deviations of the errors

500

How many dependent variables can be in a linear regression equation?

ONLY ONE always. In simple linear regression there is one dependent and one independent variable. In multiple linear regression there is one dependent variable and multiple independent variables.

500

When would you use a Chi-square test?

To test if classifications are dependent. H0: classifications are independent Ha: classifications are NOT independent

500

How do you draw conclusions from Tukey's analysis?

Look for zero in the CI's estimating the difference in population means for every combination. If zero is contained there is not sufficient evidence of a difference between those two population means. If zero is not contained in the interval there is evidence of a difference between those population means.

500

Interpret ß0-hat and ß1-hat.

ß0-hat is the sample y-intercept. When all independent variables (IN CONTEXT) equal zero, the dependent variable = ß0-hat, on average, in this linear model. If x1, x2, x3,...,xn = zero is not included in this data set, then this interpretation cannot carry much weight. ß1-hat is the first sample slope. For every (specific increment and unit) increase in the first independent variable, the independent variable increases or decreases by ß1-hat (specific increment and unit), on average, in this linear model, as long as the values of the other independent variables remain constant.

500

Define Autocorrelation.

It is a strong linear relation between the errors. It shows in the Versus Order plot, and it violates the assumption that the errors are random and independent.