Epidemiology Study Designs

USMLE Vignette/Scenario Identification

When the study DOESN'T work

Bias and Confounding

Validity & Reliability

Statistics by Study Design

100

Investigators compare patients diagnosed with lung cancer to matched controls without cancer and assess prior cigarette smoking exposure. A strong association is found, fundamentally changing medical understanding of cancer risk. The study design is...

What is a case–control study?

Doll R, Hill AB. Smoking and Carcinoma of the Lung. British Medical Journal. 1950;2:739. doi:10.1136/bmj.2.4682.739

Richard Doll: The man who stopped smoking: https://www.youtube.com/watch?v=VBWGM630zG0

100

This study design cannot establish temporal or causal relationships.

What is a cross-sectional study design?

100

Patients with cancer are more likely than controls to recall past toxic exposures.

What is recall bias?

Recall bias is a type of systematic error that occurs when participants with an outcome (cases) remember or report past exposures differently than participants without the outcome (controls).

Key points (exam-ready):

Most common in case–control studies
Cases often over-report exposures; controls may under-report
Leads to misclassification of exposure
Can exaggerate or distort the association between exposure and disease

100

A diagnostic test for tuberculosis yields highly variable results when repeated in the same patient under identical conditions.

Which of the following properties is most compromised?

A. Accuracy
B. Validity
C. Reliability
D. Sensitivity
E. Specificity

C. Reliability

Reliability is the consistency or reproducibility of a measurement—the extent to which the same result is obtained when a measurement is repeated under identical conditions.

Key points (USMLE-ready)

Answers the question: “Does this measure give the same result every time?”
A test can be reliable but not valid
A test cannot be valid if it is not reliable
Reflects precision, not accuracy

100

This statistic is most commonly used to measure association in a case–control study.

What is an odds ratio?

Odds ratio (OR) is a measure of association that compares the odds of exposure among cases to the odds of exposure among controls (or, equivalently, the odds of an outcome in the exposed vs unexposed).

When it’s used (USMLE-ready)

Primary measure in case–control studies
Approximates relative risk when the outcome is rare
2x2 table disease/exposure
OR = a*d/b*c

200

A study enrolls newborns exposed in utero to valproic acid (e.g. anieplieptic medication also used to treat biopolar disorder and migraine prophylaxis). Fetal exposure can occur throughout pregnancy, particularly during the first trimester. Newborns are followed for 10 years to assess neurodevelopmental outcomes.

10 bonus points if you can identify the type of statistics commonly associated w/ this study design.

What is a prospective cohort study?

OR Spina bifida 12.7 (95% CI: 7.7 to 20.7)

OR Cleft palate 5.2 (95% CI: 2.8 to 9.9)

Jentink, J., Loane, M. A., Dolk, H., Barisic, I., Garne, E., Morris, J. K., & de Jong-van den Berg, L. T. (2010). Valproic acid monotherapy in pregnancy and major congenital malformations. New England Journal of Medicine, 362(23), 2185-2193.

200

This design is inefficient for studying rare diseases.

What is a cohort study?

200

In a study assessing alcohol consumption, participants systematically underreport the number of drinks consumed per week because heavy drinking is socially stigmatized.

Social desirability bias is a type of information (response) bias in which participants systematically overreport socially acceptable behaviors and underreport stigmatized or undesirable behaviors to present themselves in a favorable light.

Key points (exam-ready):

Common in self-reported surveys/interviews
Affects behaviors like alcohol/drug use, sexual practices, diet, exercise, adherence
Leads to misclassification of exposure (often differential)
Can bias associations toward or away from the null, depending on context

200

A newly developed depression screening questionnaire strongly correlates with a structured psychiatric interview, which is considered the gold standard.

This finding best supports which of the following?

A. Face validity
B. Content validity
C. Construct validity
D. Criterion validity
E. Reliability

D. Criterion Validity

Criterion validity is the extent to which a measurement or test correlates with an external gold standard (criterion) that is already accepted as accurate.

Key points (USMLE-ready)

Assesses how well a test performs compared to a gold standard
A subtype of validity (accuracy), not reliability
Commonly evaluated using correlation coefficients
Requires the existence of a true or accepted reference standard

200

This statistic is directly calculated in a cohort study to compare disease occurrence between exposed and unexposed groups.

What is relative risk?

Relative risk is a measure of association that compares the risk (incidence) of an outcome in the exposed group to the risk in the unexposed group.

Interpretation

RR = 1 → no association
RR > 1 → exposure increases risk
RR < 1 → exposure is protective
RR a/(a+b) | c/(c+d)
< 10% of pop. exposed OR and RR are close
> 10% of pop. exposed (common outcome), OR will exaggerate RR

300

Investigators analyze data from the National Health and Nutrition Examination Survey to examine the relationship between body mass index (BMI) and hypertension in U.S. adults. Height, weight, and blood pressure measurements are collected during a single examination visit, along with questionnaires on diet and physical activity. No follow-up data are obtained.

Which of the following is the most appropriate interpretation of findings from this study?

A. It can be used to calculate the incidence of hypertension
B. It can establish a causal relationship between BMI and hypertension
C. It can estimate the prevalence of hypertension in the population
D. It can determine the temporal sequence of exposure and disease
E. It is best suited for studying rare diseases

C. It can estimate the prevalence of hypertension in the population.

Note: Cross-sectional studies measure prevalence, not incidence, and cannot establish temporality or causality.

Kuczmarski, R. J., Flegal, K. M., Campbell, S. M., & Johnson, C. L. (1994). Increasing prevalence of overweight among US adults: the National Health and Nutrition Examination Surveys, 1960 to 1991. Jama, 272(3), 205-211.

300

This study design cannot calculate incidence or relative risk directly?

What is a case-control study?

300

Hospitalized patients are used as controls in an occupational exposure study.

What is selection bias?

Selection bias occurs when systematic differences exist between those who are selected for a study and those who are not, or between cases and controls, resulting in a study population that is not representative of the target population.

Key points:

Arises from how participants are selected or retained, not how data are measured
Can distort the association between exposure and outcome
Often leads to non-comparable groups
Can occur at enrollment or through loss to follow-up
Case–control studies (control selection is critical)
Cohort studies with differential loss to follow-up

300

A randomized controlled trial minimizes confounding and bias through randomization but is conducted in a highly selected population that differs from most real-world patients.

Which type of validity is most limited?

A. Internal validity
B. External validity
C. Construct validity
D. Face validity
E. Criterion validity

B. External validity

External validity is the extent to which the results of a study can be generalized beyond the study population to other people, settings, and time periods.

Key points (USMLE-ready)

Reflects generalizability
Answers the question: “Do these results apply to real-world populations?”
Can be limited by strict inclusion/exclusion criteria, artificial settings, or highly selected samples
Often reduced in tightly controlled randomized controlled trials

300

This measure is most appropriate for describing disease burden in a cross-sectional study.

What is prevalence?

Prevalence is the proportion of a population that has a disease or condition at a specific point in time (or over a specified period).

Prevalence = N of existing cases/ total population at a specific time period

x a multiplier

Why prevalence matters (USMLE Step 1–ready)

Measures disease burden, not risk
Helps guide healthcare planning and resource allocation
Strongly influenced by disease duration
Chronic diseases → high prevalence

400

A physician documents a cluster of five immunocompromised patients with a rare fungal pneumonia following the introduction of a new hospital ventilation system. This study represents...

What is a case series?

CDC. Pneumocystis pneumonia — Los Angeles. Morbidity and Mortality Weekly Report (MMWR). 1981 Jun 5;30(21):250–252

400

This study is most vulnerable to confounding by indication.

What is an observational treatment study?

Confounding by indication occurs when the reason a treatment is prescribed (the indication) is itself associated with the outcome, creating a false or distorted association between the treatment and the outcome.

400

Patients receiving treatment have worse outcomes because they were sicker at baseline.

What is confounding by indication?

Confounding by indication occurs when the reason a treatment is given (the clinical indication) is itself associated with the outcome, creating a spurious association between the treatment and the outcome.

Key points:

Patients receiving treatment are often sicker or at higher baseline risk
Makes treatments appear harmful when they are actually markers of disease severity
A form of confounding, not information or selection bias
Particularly problematic when treatment is not randomly assigned
Observational treatment studies
Retrospective cohort studies evaluating therapies
Case–control studies examining treatment effects

400

Two independent clinicians reviewing the same radiographs reach the same diagnosis in nearly all cases.

This finding best demonstrates which of the following?

A. Internal validity
B. External validity
C. Inter-rater reliability
D. Criterion validity
E. Content validity

C. Inter-rater reliability

Inter-rater reliability is the degree to which two or more independent observers (raters) produce the same results when assessing the same subjects using the same measurement method.

Key points (USMLE-ready)

Measures agreement between different observers
Reflects consistency, not accuracy
High inter-rater reliability reduces observer variability
Commonly assessed using kappa statistics (for categorical data) or intraclass correlation coefficients (for continuous data)

400

This statistical test is commonly used to compare means between two independent groups in clinical or epidemiologic studies. Name the parametric and non-parametric test?

What is the independent-samples t-test (parametric test)?

What is the Mann-Whitney U (Wilcoxon rank-sum non-parametric test)?

500

Several patients develop postoperative Staphylococcus aureus (nosocomial/ hospital-based) infections within days of surgery performed by the same surgeon. Investigation reveals the surgeon was perspiring heavily, with sweat contaminating the operative field. Infected patients are compared with uninfected surgical patients operated on during the same period.

What is a case-control study?

Girou, E., Stephan, F., Novara, A. N. A., Safar, M., & Fagon, J. Y. (1998). Risk factors and outcome of nosocomial infections: results of a matched case-control study of ICU patients. American journal of respiratory and critical care medicine, 157(4), 1151-1158.

500

This study design should not be used to make individual-level causal inferences because exposure and outcome are measured at the group level.

What is an ecological study?

500

Group-level exposure data are incorrectly applied to individuals.

What is ecological fallacy?

Ecological fallacy occurs when associations observed at the group (population) level are incorrectly assumed to apply to individuals within those groups.

Key points

Arises from ecologic studies
Uses group-level data, not individual-level exposure or outcome data
Cannot determine individual risk or causality
Can lead to incorrect conclusions about individual behavior or disease risk
A limitation of study design, not a type of bias in measurement

500

A blood pressure cuff gives consistent but incorrect readings due to faulty calibration. This is an indication of...

What is high reliability with low validity?

Reliability is the consistency or reproducibility of a measurement—the extent to which the same result is obtained when the measurement is repeated under identical conditions.

Validity is the accuracy or truthfulness of a measurement—the extent to which a measure actually assesses what it is intended to measure.

Reliability = consistency; Validity = accuracy.

You cannot have validity without reliability, but you can have reliability without validity.

500

This regression model is commonly used in case–control studies to adjust for confounders and estimate an adjusted measure of association.

What is logistic regression (adjusted odds ratios)?