OLS Jeopardy!!!!

The Most Excellent Estimator

Define that Estimator (aka the yellow boxes)

Goodness of Fit

Assuming makes an ass out of u m e

Applications

100

If the least squares estimator were a color, what color would it be and why?

BLUE--it is the most efficient estimator within the class of linear, unbiased estimators.

100

hat(alpha)

ybar-hat(beta)xbar

100

What are the 3 goodness of fit measures we discussed, and what does each tell us? What would each look like if our model was doing a good job?

(1) R^2: The variance in the DV that is explained by the IV(s). High R^2-->more explained variance-->:) (2) Standard Error of the Regression: The standard deviation of the data points as they are distributed around the regression line. Low standard error of the regression-->data clustered around regression line-->:) (3) F statistic: Tells us whether the estimated coefficients are different from zero. Simply, it indicates our independent variables provide any additional information about the dependent variable than the intercept. High F statistic-->low p-value-->reject null that model doing nothing-->:)

100

What assumption of OLS does omitted variable bias violate, and when should we NOT worry about omitted variables.

Omitted variable bias violates the assumption of strict exogeneity (E(epsilon|X)=0). Bias is eliminated if the omitted variable is uncorrelated with the independent variable in the model, or is uncorrelated with the dependent variable.

100

You are testing the effects of gender on one's propensity to own a puppy. You specify the following model: Puppy Ownership=B_0+B_1(Female)+B_2(Male) What do you expect the results to look like and why? HINT: What assumption is being violated here?

You statistical software package will not give you an answer, because the full rank assumption is violated. Male and female are mutually exclusive, exhaustive categories; when added together, they equal a column of 1s, which is perfectly collinear with the intercept. Because the X matrix does not have full rank, it cannot be inverted, and the betas cannot be estimated.

200

What does linearity refer to?

The parameters.

200

hat(beta)

(sum(x-xbar)(y-ybar))\(sum(x-xbar)^2)

200

You are given the following information: (1) The total sum of squares (2) The regression sum of squares (3) The sum of squares of residuals (4) The correlation between X and Y What are THREE ways you could calculate R^2?

(1) 1-SS_errors/SS_total (2) SS_regression/SS_total (3) square of the correlation

200

Because of the strict exogeneity assumption, we know that var(epsilon|X)=? (give equation, and also say what the matrix would look like). Give 2 examples of cases in which this assumption might be violated.

sigma^2*I_n That is, the variance-covariance matrix has sigma^2 (the variances of the epsilons, conditional on the Xs) on the diagonal, and zeros everywhere else (since Cov(X, epsilon)=0). This assumption might be violated if (1) there existed spatial-correlation, or auto-correlation (variables correlated temporally) or (2) the elements on the leading diagonal vary as a result of violations of homoskedasticity.

200

We have the following regression model where both X and D are dummy variables: E(y|X, D)=B_0+B_1X_i+D_i(B_2+B_3X_i) What would the graph of X and Y look like if we found that B_3=0 and B_2>1, for D=0 and D=1.

Parallel lines.

300

Under what conditions is the least squares estimator unbiased?

E(epsilon|x)=0

300

Express Bhat in matrix form.

(X'X)^(-1)X'Y

300

We have the following regression model where both X and D are dummy variables ("binary indicators"): E(y|X, D)=B_0+B_1X_i+D_i(B_2+B_3X_i) How would you test that E(Y|X)=E(Y|X,D)?

F test with two restrictions--B_2=0 and B_3=0

300

What happens if the conditional iid assumption is violated, and how does this affect our hypothesis test?

The standard errors are biased. If they are too big, the p-value will be too big, and we will underestimate the effect. If they are too small, the p-value will also be too small, and we will overestimate the effect of the IV. Simply put, the standard errors will be wrong, and our hypothesis test invalid.

300

We have the following regression model where both X and D are dummy variables: E(y|X, D)=B_0+B_1X_i+D_i(B_2+B_3X_i) What would the graph of X and Y look like if we found that B_2=0 and B_3>0, for D=0 and D=1.

Two lines with common intercepts.

400

There is a big trade off we encounter when specifying a model. What is it, and why?

The bias-variance trade-off. We can reduce bias by including more variables in the model specification. Doing so, however, increases the variance of our estimates (lowers efficiency), since we are asking the same data to tell us more. So we can get more accurate, less precise estimates, or more precise, less accurate estimates. One particularly problematic case is multicollinearity--if you include two variables (say, party ID and vote on a very partisan piece of legislation), the variable with more explanatory power will be signed correctly, but the sign on the other variable will flip, since all the variation it would have explained in Y is accounted for.

400

hat(sigma)^2

sum(hat(U_i)^2)/n-2

400

You have to send your paper to the APSR in 30 seconds (!!!!), and can't decide between two models. The first has 5 independent variables and an R^2 of 0.86. The second includes 3 of these 5 independent variables, and has an R^2 of 0.63. Which model do you include?

You don't have enough information here to decide, since the R^2 is not very informative (adding more variables to the model, no matter how unrelated they are to the DV, increases this statistic). To choose between the long and short model, you would need an F-test. If the F-statistic was big-->low p-value (<0.05)-->you know that the added variables in the long model are contributing significantly to your understanding of the variance in the DV. If the p-value on the F-statistic is insignificant, you would use the short model, as the added variables aren't explaining a whole lot extra.

400

Arguably, omitted variable bias is particularly problematic when X'_1X_2 is (positive/negative) and beta_2 is (positive/negative) and the bias (E(hat(beta_1)-beta_1) is (positive) OR X'_1X_2 is (positive/negative) and beta_2 is (positive/negative) and the bias (E(hat(beta_1)-beta_1) is (positive) because ...?

positive, positive-->positive bias negative, negative-->positive bias In both cases, we will overestimate the effect of the X_1 on Y. Generally/arguably, we prefer conservative estimates, so overestimation is more problematic than underestimation.

400

Something's fishy about the election results for one of the Palo Alto city council members, and you are asked to review the results of the election (the representative won 62% of the vote). You estimate a regression with previous election results as the IV and the latest result as the DV. Your model predicts that the representative in question should have received .52 of the vote; the standard error around this prediction is .04. Can you be 95% sure that the observed results were, in a sense, drawn from the same data generating process as the previous election results? (you can estimate a bit imprecisely)

0.52+/-2*0.04=(0.44, 0.6) No, we cannot by 95% sure; the election was probably rigged, or there was a significant shock that altered the electoral environment in Palo Alto from previous years.

500

What is the Gauss Markov Theorem?

If the following hold: (1) Strict exogeneity: E(epsilon|X)=0 (2) Conditionally iid disturbances: E(epsilon*epsilon|X)=sigma^2*I_n (3) X has full column rank then the ols regression estimator hat(Beta)=(X'X)^-1X'Y is BLUE.

500

var(hat(beta))

sigma^2*sum(w_i^2)=sigma^2/(sum(x-xbar)^2)

500

What 5 pieces of information do you need to calculate an F-test?

(1) the residual sum of squares of the restricted model (2) the residual sum of squares of the unrestricted model (3) the number of restrictions being tested (q) (4) the number of observations (n) (5) the number of parameters in the unrestricted model (k) ((RSS_restricted-RRS)/q)/(RSS/n-k)

500

What 2 assumptions give add up to the conditional iid assumption (that is, the DISTURBANCES are conditionally independent and identically distributed). Hint: Two assumptions add up to the first of these assumptions.

(1) Conditionally Uncorrelated Errors (E(epsilon_i epsilon_j)=0 for all i not equal to j). This is implied by random sampling ((x_i, y_i), i=1, 2, ...,n are iid draws from a joint distribution) and weak exogeneity (E(epsilon_i|x_i)=0, i=1,...,n). (2) Identical Conditional Variances aka conditional homoskedasticity (var(epsilon_i|X)=E(epsilon^2|X) for all i=1,...,n)

500

You are reading a paper that uses regression analysis to predict the likelihood that a given state will have a civil war using a variety of covariates. The following results are presented to you: Intercept: 1.03 (0.25) Mountainous Terrain: 5.67(2.3) Settler Mortality: 0.12(0.03) Ethnic Fractionalization: 8.6(5.32) Which variables are significant at the p<0.05 level?

The Intercept, Mountainous Terrain, and Settler Mortality.