JEPIDY

Wacholder

Richardson

Poole

GLMs

Grab Bag

100

Estimation using the techniques that Wacholder describes requires that the study base be which of the following: primary, secondary, either, or neither? Why?

Primary because the cohort giving rise to the cases and controls has to be fully enumerated

100

What is the advantage of using Richardson’s stratification weights over adjusting for D1?

We will be able to obtain an unbiased estimate of the association between X & D2

100

What is the key to selecting a control group?

Comparability between controls & the study's population at risk

100

What two components do generalized linear models (GLMs) depend on?

The link and the variance

100

p/(1-p)

What is the odds ratio?

200

What does Wacholder's model assume about the "missing data"? Be specific.

That the data is missing at random (MAR) conditional on disease status

200

Two parts: What do Richardson’s stratification weights represent? We have learned similar weighting schemes in the past; by what name would we recognize this concept?

The reciprocal of the selection probability for the stratum; Inverse probability weighting

200

Why shouldn't we condition on an intermediate?

Doing so will obscure any causal effect of exposure on outcome that occurs through the intermediate

200

What are the canonical links for the linear, binomial, and Poisson models?

Identity, logit, and log respectively

200

In a case-control study, the ratio of sample size to population size (f1 = n1/N1 and f2 = n0/N0); if the controls are sampled independently of exposure, the same proportion is taken from the exposed and unexposed person-time pools

What is the sampling fraction?

300

DAILY DOUBLE! Two parts: Wacholder uses the identity link with this data distribution to get this measure of association.

What is the binomial distribution and the risk difference?

300

Name a scenario in which adjusting for D1 would not bias the estimate of the association between X & D2.

D1 is not associated with D2 OR D1 is a confounder of the association between X & D2

300

Idea that a case-control study should compare sick people to healthy people

What is the trophoc fallacy?

300

Do you always have to use the canonical link that correspond's to your data's distribution? If so, explain why. If not, give an example and the measure of association that would be estimated under your example.

No. Examples: Binomial data+log link -> RR; Binomial data as a probability+identity link -> RD; Binomial data as an odds+identity link -> OD

300

In Wacholder’s risk difference model, where he fits the probability itself (vs. a transformation of the probability) with a specific link, what measure of association does the intercept (i.e., the alpha) represent? Assume that it is a simple model with a binary outcome and single binary predictor variable (x=1 if exposed, x=0 if unexposed): Pr(D=1 | E=e) = alpha + beta1*x1

Absolute risk of the unexposed

400

How does the pseudo-likelihood method described by Wacholder distribute the cases and controls with missing covariate information into the exposure cells?

Proportional to the empirical distributions of observed cases and controls, respectively

400

What are the two main limitations of Richardson’s technique for analyzing an outcome within a case-control study designed to assess a different outcome?

1) Lack of power; 2) dependent on having an enumerated population

400

Name two ways that the trophoc fallacy manifests itself.

1) Concern that controls must be comparable to cases; 2) Concern that controls must be perfectly healthy & free of any intermediate outcomes

400

What is the disadvantage of no using the canonical link with binomial data?

Predicted outcomes not constrained to range from 0-1

400

DAILY DOUBLE! True or false: If we use canonical links and variance then the GLM is MLE and is known and from the exponential family

True

500

What is a sufficient statistic?

The sufficient statistic contains all of the information in the data about the parameter of interest. In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter".

500

When will the results of the adjusted logistic regression approach and the stratum-weighted approach diverge?

When D1 is a confounder of or intermediate of the X-D2 relationship.

500

According to Poole, what is the most likely effect of removing controls that are positive for an intermediate and why should be wary of it?

Removing controls that are positive for an intermediate will likely cause a bias away from the null. This is dangerous because it could also be (falsely) interpreted as removal of a bias toward the null

500

This method of estimating regression coefficients does not fully specify the distribution of the observed data and instead derives estimators based on the first two moments (i.e., mean and variance) of the distributions; it is a more flexible approach to estimation than maximum likelihood estimation.

What is quasi-likelihood?

500

Would you expect the odds-difference model estimates to be the same, slightly larger, or slightly smaller than the risk difference model estimates? Why?

Slightly larger because an odds always exceeds the corresponding probability