Assumption Testing & Outliers (100)
Polynomials
sr2 and pr2
Power Analysis
Interpreting MLR (400 and 500 are for outliers)
100

How are leverage tests different from discrepancy tests?

Leverage is testing for outliers in X values, discrepancy is testing for outliers in Y values

100

If a researcher conducted a power polynomial with a quadratic term and the beta was negative, what would be the shape of the curve?

Upside down U-shape

100

What will this code provide in the output: 

spcor.test(data$happiness,data$gratitude,data$social)

The sr value for the correlation between gratitude and happiness, semi-partially controlling for social (support)

100

Which power analysis will require more participants to detect an effect?

```{r}

pwr.r.test(n = NULL, r = .10, sig.level = .05, power = .80)

pwr.r.test(n = NULL, r = .46, sig.level = .05, power = .80)

```

The first one with r = .10, since it is a smaller effect

100

What is the following code demonstrating: 

```{r}

library("ggpubr")

ggscatter(profs, x = "pubs", y = "salary", 

          add = "reg.line", conf.int = TRUE,

          cor.method = "pearson",

          xlab = "Number of Publications", ylab = "Annual Salary")

```

Creating a scatterplot using the profs dataset with pubs on the x-axis and salary on the y-axis, adding a regression line of best fit with confidence interval bands, using the pearson correlation method. It also has labels for x and y axes.

200

If a researcher examined a scale-location plot, what assumption test are they checking, and what are they looking for (assuming they do not want to violate this assumption)?

Homoscedasticity, they want the spread of the residuals/red line to be as horizontal as possible

200

If someone mean centered an orthogonal power polynomial, will their results be different from someone who did not mean center their orthogonal power polynomial?

No; orthogonal polynomial contrasts are designed to be uncorrelated with each other regardless of the scale or centering of the original predictor variable. Therefore, even if you center or scale the predictor variable, the resulting orthogonal polynomial contrasts should be the same.

200

A statistician wants to know how much unique variance explained in IQ can be determined by their IVs (enjoyment of reading, age). Which correlation should they run?

sr2

200

What effect will increasing power from .80 to .95 have on the number of participants necessary to detect an effect?

You will need more participants when power is higher

200

How would you interpret the following regression equation: ŷi(happiness) = 3.11 + .25x(social support) + .33x2(gratitude)

Social support positively predicts (or is related to) happiness and gratitude also positively predicts (or is related to) happiness. The intercept is 3.11, suggesting that when social support and gratitude are zero, happiness is 3.11.

For each one unit increase in social support, happiness increases by .25.

For each one unit increase in gratitude, happiness increases by .33.

300

Gene runs a VIF test and finds the following:

gender      friends      pets
1.763010  11.820156  4.181128

Are there any variables that violate the assumption of multicollinearity and if so, which one(s)?

Friends, the VIF score is too high

300

What are the advantages of mean centering power polynomials?

They reduce multicollinearity between lower and higher order terms and allow easier interpretation of the intercept and linear terms because they can be interpreted as the value/slope of Y when X is at its mean

300

A statistician is trying to determine the association between gratitude and happiness, while completely controlling for the effect of social support on both gratitude and happiness. Which correlation should be conducted?

pr2

300

What is the following line of code doing, and why might this be necessary?

pwr.r.test(n = 50, r = .21, sig.level = .05, power = NULL)

It's a post hoc power analysis to find how much power this study has. You may do this to check how powered (including underpowered) your study is if you cannot increase your sample size.

300

JJ ran a multiple regression predicting dollars made per hour with three independent variables: age, years of education, and scores on a job skills measure.

 Their model produced the following estimates:

yhat = 1.75 + 1.7(age) + 1.2(years of education) - + 1.11(job skills)

 What would be the predicted number of dollars made per hour for a 31 year old with 26 years of education, who scored a 6.1 on the job skills measure?

$92.421

400

A researcher's regression model violates homoscedasticity and normality of residuals. Which plots did they use to examine these assumptions? What should they do as a result of violating these assumptions?

They looked at a scale location plot (fitted vs residuals is fine, just not as accurate) and a QQ plot or histogram. They should consider running a polynomial or perform a non-linear transformation of the DV, like a square root or log

400

How are and aren't orthogonal power polynomials different from mean centered non-orthogonal polynomials?

Orthogonal power polynomials will be similar to mean centered non-ortho polys in that they will have reduced (actually zero) multicollinearity. They will yield different betas because mean centering is different from scaling variables to be zero correlated with one another

400

Teddy finds the following values:

sr= .22

pr= .31

His IVs are gratitude and social support. These values refer to the effects of gratitude on happiness (separate values were obtained for social support). Why did Teddy square these values? What does this mean for the effect of gratitude on happiness (i.e., interpret the numbers)?

He squared them so he could interpret them as variance explained. This means that gratitude uniquely explains 22% of the variance in happiness (sr2). Also, gratitude explains 31% of the variance in happiness when social support has been partialed out/controlled for (pr2).

400

Degrees of freedom of the regression is 8 and degrees of freedom of the residual is 204. How many predictors are in the model? How many participants were in the study?

8 predictors (excluding the intercept) and 213 participants

400

When running outlier diagnostics, a researcher finds a participant to have a hat value of .99. If they have not run any further tests, what should they do? What will determine if this participant is an outlier or not?

They should also run tests of discrepancy and influence; knowing the participant is an outlier only for an IV(s) does not mean they are an outlier. If the participant is also an outlier on tests of discrepancy and influence, they should likely be removed from the model with an explanation of why.

500

What are the differences between the assumptions of homoscedasticity, independence of residuals, and normality of residuals? Which plots and significance tests would enable you to examine these assumptions and how do you know if you have violated them?

Homoscedasticity refers to even spread of (standardized and square rooted) residuals across the fitted values (plot: scale location. test: BP). Independence of residuals refers to residuals not being associated with participant ID/being independent of one another (plot: ID on X axis, residuals on Y. test: DW). Normality of residuals refers to residuals being normally distributed (plot: qq or histogram. test: shapiro-wilk using standardized residuals or residuals). You've violated these assumptions if the p-values are less than (or equal to) .05.

500

Terry ran a quadratic power polynomial that is non-orthogonal and not mean centered. What can be said about the beta estimates and p-values? Assume the relationship can be best understood as a U-shaped curve. What should Terry do instead and why?

Probably both the linear and quadratic betas are significant predictors and highly multicollinear, making interpreting them alone extremely difficult/impossible. Terry should either mean center or run an orthogonal polynomial to reduce/eliminate multicollinearity and interpret lower order betas as the value/slope of Y when X is at its mean.
500

sr2 and standardized beta equations are almost identical. What is the difference between them? How would you be able to interpret these values?

sris square rooted on the denominator, otherwise, they are the same. This allows us to interpret sras variance explained, which we cannot do with standardized beta. We interpret SB as how many SD units change in y for every 1 SD unit change in x. You can estimate both as controlling for other variables in the model.

500

After running a power analysis, someone finds the following result:

u = 7

v = 41.69

f2 = .34

sig.level = .05

power = .80

How many participants are needed to power this study? How would a) increasing power to .95 and b) increasing f2 change the value of v?

50 participants are needed (42 + 7 + 1). a) you would need more participants, so v would increase, and b) you would need less participants, so v would decrease

500

Harriet finds a participant with a DFFITS value of 8.98. How can she interpret this? What is this a measure of, and what does that mean?

Global measures of influence (like DFFITS) provide information about how a case affects overall characteristics of the model (x and y). The value means that if that participant were to be removed from the model, the predicted value of y would change by 8.98 SDs of the residual.