Wacholder
Richardson
Poole
GLMs
Grab Bag
100
Estimation using the techniques that Wacholder describes requires that the study base be which of the following: primary, secondary, either, or neither? Why?
Primary because the cohort giving rise to the cases and controls has to be fully enumerated
100
What is the advantage of using Richardson’s stratification weights over adjusting for D1?
We will be able to obtain an unbiased estimate of the association between X & D2
100
What is the key to selecting a control group?
Comparability between controls & the study's population at risk
100
What two components do generalized linear models (GLMs) depend on?
The link and the variance
100
p/(1-p)
What is the odds ratio?
200
What does Wacholder's model assume about the "missing data"? Be specific.
That the data is missing at random (MAR) conditional on disease status
200
Two parts: What do Richardson’s stratification weights represent? We have learned similar weighting schemes in the past; by what name would we recognize this concept?
The reciprocal of the selection probability for the stratum; Inverse probability weighting
200
Why shouldn't we condition on an intermediate?
Doing so will obscure any causal effect of exposure on outcome that occurs through the intermediate
200
What are the canonical links for the linear, binomial, and Poisson models?
Identity, logit, and log respectively
200
In a case-control study, the ratio of sample size to population size (f1 = n1/N1 and f2 = n0/N0); if the controls are sampled independently of exposure, the same proportion is taken from the exposed and unexposed person-time pools
What is the sampling fraction?
300
DAILY DOUBLE! Two parts: Wacholder uses the identity link with this data distribution to get this measure of association.
What is the binomial distribution and the risk difference?
300
Name a scenario in which adjusting for D1 would not bias the estimate of the association between X & D2.
D1 is not associated with D2 OR D1 is a confounder of the association between X & D2
300
Idea that a case-control study should compare sick people to healthy people
What is the trophoc fallacy?
300
Do you always have to use the canonical link that correspond's to your data's distribution? If so, explain why. If not, give an example and the measure of association that would be estimated under your example.
No. Examples: Binomial data+log link -> RR; Binomial data as a probability+identity link -> RD; Binomial data as an odds+identity link -> OD
300
In Wacholder’s risk difference model, where he fits the probability itself (vs. a transformation of the probability) with a specific link, what measure of association does the intercept (i.e., the alpha) represent? Assume that it is a simple model with a binary outcome and single binary predictor variable (x=1 if exposed, x=0 if unexposed): Pr(D=1 | E=e) = alpha + beta1*x1
Absolute risk of the unexposed
400
How does the pseudo-likelihood method described by Wacholder distribute the cases and controls with missing covariate information into the exposure cells?
Proportional to the empirical distributions of observed cases and controls, respectively
400
What are the two main limitations of Richardson’s technique for analyzing an outcome within a case-control study designed to assess a different outcome?
1) Lack of power; 2) dependent on having an enumerated population
400
Name two ways that the trophoc fallacy manifests itself.
1) Concern that controls must be comparable to cases; 2) Concern that controls must be perfectly healthy & free of any intermediate outcomes
400
What is the disadvantage of no using the canonical link with binomial data?
Predicted outcomes not constrained to range from 0-1
400
DAILY DOUBLE! True or false: If we use canonical links and variance then the GLM is MLE and is known and from the exponential family
True
500
What is a sufficient statistic?
The sufficient statistic contains all of the information in the data about the parameter of interest. In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter".
500
When will the results of the adjusted logistic regression approach and the stratum-weighted approach diverge?
When D1 is a confounder of or intermediate of the X-D2 relationship.
500
According to Poole, what is the most likely effect of removing controls that are positive for an intermediate and why should be wary of it?
Removing controls that are positive for an intermediate will likely cause a bias away from the null. This is dangerous because it could also be (falsely) interpreted as removal of a bias toward the null
500
This method of estimating regression coefficients does not fully specify the distribution of the observed data and instead derives estimators based on the first two moments (i.e., mean and variance) of the distributions; it is a more flexible approach to estimation than maximum likelihood estimation.
What is quasi-likelihood?
500
Would you expect the odds-difference model estimates to be the same, slightly larger, or slightly smaller than the risk difference model estimates? Why?
Slightly larger because an odds always exceeds the corresponding probability