OLS Jeopardy (Part II)

Dealing with Serial Correlation

As N gets big...

What's Wrong With My Regression?

Influence

Violations of Endogeneity

100

(1) You believe that your disturbances follow an AR(1) process where –1>rho<0 (epsilon_t=(rho*epsilon_t-1)+u_t). What do you expect a graph of your disturbances over time to look like (describe or draw). (2) Consider the same autoregressive process.Describe the dynamics of this process if rho > 1.

(1) Saw-tooth pattern where switch from + to – in each period. (2) Explosive growth/non-stationarity.

100

What does it mean for an estimator to be consistent?

As sample size n tends to infinity, estimator converges on its value in the population. Plim(hat(sigma_n)=sigma

100

A regression that has many statistically insignificant predictors, but a relatively high r^2 is most likely to be suffering from...?

Multicollinearity

100

What is the difference between influence (influential observations), outliers, and leverage?

Outliers are observations that are not well fit by the model. Leverage is the distance of the predictor value from the mean predictor value. Influence is the combination of large residuals and large leverage. That is, a case is influential if it is an outlier and is far from the mean predictor value. Influential cases are those that, if removed from the regression, would substantially change the results.

100

Let the structural model of substantive interest be y = X_1b_1 + X_2b_2+ e, where X_1 are endogneous regressors and X_2 are exogenous regressors. Let Z be a matrix of exogenous regressors, with X_2 not equal to Z. Then the ‘‘first stage’’ of the two-stage least squares estimator consists of regressing each endogenous regressor x_j, where j indexes the columns of X_1, on...?

X_2 and Z

200

You estimate the following model: y_t=beta_0+phi*y_t-1+beta_2*x_t+u_t where |phi|<1. (1) What is the impact of y_t in y_t+1? (2) What is the impact of x_t on y_t+1? (3) What is the impact of y_t on y_t+1001?

(1) phi; (2) phi*beta_2; (3) phi^1001*beta_2

200

What two conditions are necessary to establish the consistency of an estimator?

(1) asymptotic unbiasedness (the limit as n‡infinity of the expectation of an estimator is the population parameter). (2) asymptotic variance is zero (limit as n‡infinity of the variance of an estimator is the population parameter).

200

After estimating the least squares regression of y on X, a researcher finds that the correlation between the regression’s predicted values hat(y) = Hy and the residuals y –hat(y) is approximately zero. What can the researcher conclude?

Nothing. E(ˆhat(y)’e) = 0, by construction, for least squares fits. That is, irrespective of whether any assumptions about the disturbances do or do not hold, this correlation will always be zero, up to errors induced by floating point representation. Hence this correlation is not informative about lack of fit, appropriateness of assumptions, etc.

200

Imagine you were using R and had no idea how to plot data, or look at the raw data frame. You also had no knowledge of canned packages. Yet, you had a 350b problem set that required you to look for influential data points! What do you do?

Estimate the hat matrix—H=X(X’X)^-1X’. Influential data points lie along the diagonal.

200

How would you assess the weakness of an instrument?

By looking at the partial R^2 (the partial correlation between the endogenous variable that you are instrumenting for and the instrument, controlling for the exogenous variables).

300

In the model y_t=beta_0+phi*y_t-1+beta_2*x_t+u_t, what is the total long run effect of a one unit change in x_t on y?

sum_r=0^infinity(phi^r*beta_2)=beta_2/1-phi

300

A researcher estimates a regression with n = 100 iid observations. The researcher comes to you with a methodological question: if she were to go out and collect 5 times as much data, how much smaller would the standard errors of her regression be? What is your answer?

With independent data, statistical precision increases with the square root of sample size. So, on average, the research will obtain sqrt(5)=2.23 as much precision, or standard errors about 1/sqrt(5)= 45% the size of what the standard errors obtained with n = 100.

300

In the presence of disturbances that are not ‘‘iid’’, the OLS estimator of b is generally...? (a) unbiased and consistent (b) the best linear unbiased estimator (c) biased but consistent (d) inconsistent

a, unbiased and consistent. Note: your standard errors are inefficient.

300

In a moment, you will be shown a plot that was generated after running a regression of the intensity of rebellion in Romania in 1907 on the strength of the middle peasantry. What are you looking at, and how do you interpret this? (see word document for plot).

Cook's distance is used to identify points with undue influence on fit. Cook's distance is roughly the average of the squares of the dfbetas. Contours of Cook's distance are included, by default at values of 0.5 and 1.0 (anything greater than 1 is really big). No data points are outside of these bounds, so it looks like there isn’t a lot to worry about in terms of influential data points.

300

What is a structural equation? What is a reduced for equation?

The structural equation is the equation derived from the theory which may be endogenous and/or exogenous. In a system of equations, the reduced form is the result of solving the system for the endogenous variables in terms of exogenous variables.

400

What is the difference between an AR(1) series and an I(1) series?

AR1 series (autoregressive series) looks at the disturbances: epsilon_t=rho*epsilon_t-1+u_t where u_t are white noise and rho taps the rate of decay (note that cor(epsilon_t, epsilon_t+/-r)=rho^r). An I(1) series is an AR(1) process, i.e. y_t=rho*y_t-1+u_t where rho=1 and u_t is I(0).

400

When will consistency not hold? (hint: 4 situations)

(1) omitted variables; (2) independent variables measured with error; (3) regression model includes lagged dependent variables AND disturbances are serially correlated; and (4) endogenous independent variables

400

You read a paper in which a naïve researcher estimates the effect of institutions on GDP per capita. (1) What can you say about the parameter estimates? To remedy the situation, you suggest that the researcher: (a) Use factor analysis to create an index (b) Add a dummy for British colonization, as she clearly has some omitted variable bias. (c) Get more data (d) Instrument for GDP per capita with settler mortality (e) Instrument for institutions with settler mortality

(1) The estimates are biased and inconsistent. (2) b&e

400

What is a DFBETA (the intuition, not the mathJ)? (extra points available for saying what positive values mean, what negative values mean, and what a good baseline is for when a DFBETA is large enough to warrant concern).

A nxk matrix where each element is the change in each of the k coefficients, scaled by each coefficient’s respective standard error, caused by deleting the ith observation. Positive values indicates that case i pushes estimate of beta up, negative values indicates that case i pulls estimate of beta down. A DFBETA of |2| or more indicates that you should pay attention to that particular observation.

400

We would use a Durbin-Wu-Hausman test to test for... ? What is the null?

Exogeneity of the instruments. Null is that instrument is exogenous (coefficient is zero when regress y on exogenous X1, endogenous X2, and predicted values of X2 when X2 is regressed on instruments).

500

What are the steps to the Cochrane-Orcutt EGLS procedure for AR(1) disturbances (hint: there are 5).

(1) Run OLS to get an estimate of the disturbance, hat(e)_t; (2) Given the AR(1) model hat(epsilon_t)=(rho*hat(epsilon_t-1)+u_t, t=2,…T, calculate hat(rho)=((sum_t=2^T(hat(epsilon_t)hat(epsilon_t-1))/(sum_t=2^T(hat(epsilon_t-1)^2); (3) Run the regression y_t-hat(rho)y_t-1= hat(rho)y_t-1=(x_t-hat(rho)x_t-1)beta+u_t, t=2,…,T or y*=X*beta+epsilon* to obtain hat(beta); (4) use hat(beta) to generate new hat(epsilon) using the raw or untransformed variables: hat(epsilon)=y-Xhat(beta); (5) repeat steps 2-5 until convergence in hat(rho).

500

There are 4 conditions under which consistency does not hold. Provide 1 potential remedy for each.

(1) omitted variables: Add omitted variable to regression; (2) independent variables measured with error: Find better measures; if one does not work, create an index using quick and easy or factor analysis; (3) regression model includes lagged dependent variables disturbances are serially correlated: include multiple lags—to test for this, use Durbin’s h test (null is no serial correlation); and (4) endogenous IV: use an instrumental variable (or an experiment…)

500

You are trying to estimate the effect of racial segregation on inequality using state level data from 2000. You run a regression with racial segregation as your IV and inequality as your DV, including state fixed effects. What potential problems arise, and how would you correct for them?

First off, your brilliant statistical software program will not run your model, since your matrix is not full rank (and is, thus, not invertible) since you have more independent variables (50 states+racial segregation) than observations (50 states). One way to correct for this is to get data over time, or look at smaller units of analysis, like counties or cities. In addition, it seems likely that you have an endogeneity problem, since inequality may generate conditions that facilitate racial segregation. You could use an instrument to correct for this; Ananat, an economist, looks at the question of the effect of racial segregation on inequality by using the length of railroads in each city as an instrument. In addition, you might have heteroskedasticity, as the effect might vary within each state or region. You could use secret weapon (for instance, break the country into four regions and run separate regressions), EGLS, or robust standard errors.

500

What are studentized residuals (mathematical expression), and how can they be used to test influence?

t_i=hat(epsilon_I)/hat(sigma_-i)*sqrt(1-h_I) where t_i has a student t distribution with n-k-1 degrees of freedom where k is the number of coefficients estimated. Can use to test null that delta=0 in the model E(Y_I)=x_ibeta+deltaD_I where D is a dummy that equals 1 for observation i and 0 otherwise. If fail to reject null, do not consider i to be an outiler.

500

Saumitra Jha shows that areas in India with sustained Hindu and Muslim interaction have less religious violence today. Why can he not run a regression with the percent of Hindus and Muslims as the independent variables and religious violence as the dependent variable? In the same paper, he instruments for the interaction of Hindus and Muslims with medieval trading ports. Why might this be a good instrument? Why might it be a bad instrument?

Hindu and Muslim citizens may choose where to live based on the degree of religious violence. In other words, religious violence may affect the degree of interaction between the two groups, so the Hindu-Muslim interaction is likely endogenous. Saumitra argues that medieval ports are good instruments, as they do not predict violence, but do predict the coexistence of Hindus and Muslims (medieval traders lived together….). We might question whether this instrument is truly exogenous (As budding political scientists, I’m sure you can come up with a story of endogeneity). In addition, we might think that the location of port cities is a weak instrument (I could not find a partial R^2 in the paper).