USMLE Vignette/Scenario Identification
When the study DOESN'T work
Bias and Confounding
Validity & Reliability
Statistics by Study Design
100

Investigators compare patients diagnosed with lung cancer to matched controls without cancer and assess prior cigarette smoking exposure. A strong association is found, fundamentally changing medical understanding of cancer risk. The study design is...

What is a case–control study?


Doll R, Hill AB. Smoking and Carcinoma of the Lung. British Medical Journal. 1950;2:739. doi:10.1136/bmj.2.4682.739

Richard Doll: The man who stopped smoking: https://www.youtube.com/watch?v=VBWGM630zG0 


100

This study design cannot establish temporal or causal relationships.

What is a cross-sectional study design?

100

Patients with cancer are more likely than controls to recall past toxic exposures. 

What is recall bias?


Recall bias is a type of systematic error that occurs when participants with an outcome (cases) remember or report past exposures differently than participants without the outcome (controls).

Key points (exam-ready):

  • Most common in case–control studies

  • Cases often over-report exposures; controls may under-report

  • Leads to misclassification of exposure

  • Can exaggerate or distort the association between exposure and disease

100

A diagnostic test for tuberculosis yields highly variable results when repeated in the same patient under identical conditions.

Which of the following properties is most compromised?

A. Accuracy
B. Validity
C. Reliability
D. Sensitivity
E. Specificity

C. Reliability


Reliability is the consistency or reproducibility of a measurement—the extent to which the same result is obtained when a measurement is repeated under identical conditions.

Key points (USMLE-ready)

  • Answers the question: “Does this measure give the same result every time?”

  • A test can be reliable but not valid

  • A test cannot be valid if it is not reliable

  • Reflects precision, not accuracy

100

This statistic is most commonly used to measure association in a case–control study.

What is an odds ratio?

Odds ratio (OR) is a measure of association that compares the odds of exposure among cases to the odds of exposure among controls (or, equivalently, the odds of an outcome in the exposed vs unexposed).

When it’s used (USMLE-ready)

  • Primary measure in case–control studies

  • Approximates relative risk when the outcome is rare

  • 2x2 table disease/exposure
  • OR = a*d/b*c
200

A study enrolls newborns exposed in utero to valproic acid (e.g. anieplieptic medication also used to treat biopolar disorder and migraine prophylaxis). Fetal exposure can occur throughout pregnancy, particularly during the first trimester. Newborns are followed for 10 years to assess neurodevelopmental outcomes.

10 bonus points if you can identify the type of statistics commonly associated w/ this study design.

What is a prospective cohort study?


OR Spina bifida 12.7 (95% CI: 7.7 to 20.7)

OR Cleft palate 5.2 (95% CI: 2.8 to 9.9)


Jentink, J., Loane, M. A., Dolk, H., Barisic, I., Garne, E., Morris, J. K., & de Jong-van den Berg, L. T. (2010). Valproic acid monotherapy in pregnancy and major congenital malformations. New England Journal of Medicine, 362(23), 2185-2193.

200

This design is inefficient for studying rare diseases.

What is a cohort study?

200

In a study assessing alcohol consumption, participants systematically underreport the number of drinks consumed per week because heavy drinking is socially stigmatized.

Social desirability bias is a type of information (response) bias in which participants systematically overreport socially acceptable behaviors and underreport stigmatized or undesirable behaviors to present themselves in a favorable light.

Key points (exam-ready):

  • Common in self-reported surveys/interviews

  • Affects behaviors like alcohol/drug use, sexual practices, diet, exercise, adherence

  • Leads to misclassification of exposure (often differential)

  • Can bias associations toward or away from the null, depending on context


200

A newly developed depression screening questionnaire strongly correlates with a structured psychiatric interview, which is considered the gold standard.

This finding best supports which of the following?

A. Face validity
B. Content validity
C. Construct validity
D. Criterion validity
E. Reliability

D. Criterion Validity


Criterion validity is the extent to which a measurement or test correlates with an external gold standard (criterion) that is already accepted as accurate.

Key points (USMLE-ready)

  • Assesses how well a test performs compared to a gold standard

  • A subtype of validity (accuracy), not reliability

  • Commonly evaluated using correlation coefficients

  • Requires the existence of a true or accepted reference standard

200

This statistic is directly calculated in a cohort study to compare disease occurrence between exposed and unexposed groups.

What is relative risk?

Relative risk is a measure of association that compares the risk (incidence) of an outcome in the exposed group to the risk in the unexposed group.

Interpretation

  • RR = 1 → no association

  • RR > 1 → exposure increases risk

  • RR < 1 → exposure is protective

  • RR a/(a+b) | c/(c+d) 

  • < 10% of pop. exposed OR and RR are close

  • > 10% of pop. exposed (common outcome), OR will exaggerate RR

300

Investigators analyze data from the National Health and Nutrition Examination Survey to examine the relationship between body mass index (BMI) and hypertension in U.S. adults. Height, weight, and blood pressure measurements are collected during a single examination visit, along with questionnaires on diet and physical activity. No follow-up data are obtained.

Which of the following is the most appropriate interpretation of findings from this study?

A. It can be used to calculate the incidence of hypertension
B. It can establish a causal relationship between BMI and hypertension
C. It can estimate the prevalence of hypertension in the population
D. It can determine the temporal sequence of exposure and disease
E. It is best suited for studying rare diseases

C. It can estimate the prevalence of hypertension in the population.

Note: Cross-sectional studies measure prevalence, not incidence, and cannot establish temporality or causality.


Kuczmarski, R. J., Flegal, K. M., Campbell, S. M., & Johnson, C. L. (1994). Increasing prevalence of overweight among US adults: the National Health and Nutrition Examination Surveys, 1960 to 1991. Jama, 272(3), 205-211.

300

This study design cannot calculate incidence or relative risk directly?

What is a case-control study?

300

Hospitalized patients are used as controls in an occupational exposure study.

What is selection bias?


Selection bias occurs when systematic differences exist between those who are selected for a study and those who are not, or between cases and controls, resulting in a study population that is not representative of the target population.

Key points:

  • Arises from how participants are selected or retained, not how data are measured

  • Can distort the association between exposure and outcome

  • Often leads to non-comparable groups

  • Can occur at enrollment or through loss to follow-up

  • Case–control studies (control selection is critical)
  • Cohort studies with differential loss to follow-up
300

A randomized controlled trial minimizes confounding and bias through randomization but is conducted in a highly selected population that differs from most real-world patients.

Which type of validity is most limited?

A. Internal validity
B. External validity
C. Construct validity
D. Face validity
E. Criterion validity

B. External validity


External validity is the extent to which the results of a study can be generalized beyond the study population to other people, settings, and time periods.

Key points (USMLE-ready)

  • Reflects generalizability

  • Answers the question: “Do these results apply to real-world populations?”

  • Can be limited by strict inclusion/exclusion criteria, artificial settings, or highly selected samples

  • Often reduced in tightly controlled randomized controlled trials


300

This measure is most appropriate for describing disease burden in a cross-sectional study.

What is prevalence?


Prevalence is the proportion of a population that has a disease or condition at a specific point in time (or over a specified period).

Prevalence = N of existing cases/ total population at a specific time period 

x a multiplier


Why prevalence matters (USMLE Step 1–ready)

  • Measures disease burden, not risk

  • Helps guide healthcare planning and resource allocation

  • Strongly influenced by disease duration

  • Chronic diseases → high prevalence

400

A physician documents a cluster of five immunocompromised patients with a rare fungal pneumonia following the introduction of a new hospital ventilation system. This study represents...

What is a case series?


CDC. Pneumocystis pneumonia — Los Angeles. Morbidity and Mortality Weekly Report (MMWR). 1981 Jun 5;30(21):250–252

400

This study is most vulnerable to confounding by indication.

What is an observational treatment study?


Confounding by indication occurs when the reason a treatment is prescribed (the indication) is itself associated with the outcome, creating a false or distorted association between the treatment and the outcome.

400

Patients receiving treatment have worse outcomes because they were sicker at baseline.

What is confounding by indication?


Confounding by indication occurs when the reason a treatment is given (the clinical indication) is itself associated with the outcome, creating a spurious association between the treatment and the outcome.

Key points:

  • Patients receiving treatment are often sicker or at higher baseline risk

  • Makes treatments appear harmful when they are actually markers of disease severity

  • A form of confounding, not information or selection bias

  • Particularly problematic when treatment is not randomly assigned

  • Observational treatment studies

  • Retrospective cohort studies evaluating therapies

  • Case–control studies examining treatment effects

400

Two independent clinicians reviewing the same radiographs reach the same diagnosis in nearly all cases.

This finding best demonstrates which of the following?

A. Internal validity
B. External validity
C. Inter-rater reliability
D. Criterion validity
E. Content validity

C. Inter-rater reliability


Inter-rater reliability is the degree to which two or more independent observers (raters) produce the same results when assessing the same subjects using the same measurement method.

Key points (USMLE-ready)

  • Measures agreement between different observers

  • Reflects consistency, not accuracy

  • High inter-rater reliability reduces observer variability

  • Commonly assessed using kappa statistics (for categorical data) or intraclass correlation coefficients (for continuous data)


400

This statistical test is commonly used to compare means between two independent groups in clinical or epidemiologic studies. Name the parametric and non-parametric test?

What is the independent-samples t-test (parametric test)?

What is the Mann-Whitney U (Wilcoxon rank-sum non-parametric test)?

500

Several patients develop postoperative Staphylococcus aureus (nosocomial/ hospital-based) infections within days of surgery performed by the same surgeon. Investigation reveals the surgeon was perspiring heavily, with sweat contaminating the operative field. Infected patients are compared with uninfected surgical patients operated on during the same period.

What is a case-control study?



Girou, E., Stephan, F., Novara, A. N. A., Safar, M., & Fagon, J. Y. (1998). Risk factors and outcome of nosocomial infections: results of a matched case-control study of ICU patients. American journal of respiratory and critical care medicine, 157(4), 1151-1158.

500
This study design should not be used to make individual-level causal inferences because exposure and outcome are measured at the group level.

What is an ecological study?

500

Group-level exposure data are incorrectly applied to individuals.

What is ecological fallacy?


Ecological fallacy occurs when associations observed at the group (population) level are incorrectly assumed to apply to individuals within those groups.

Key points

  • Arises from ecologic studies

  • Uses group-level data, not individual-level exposure or outcome data

  • Cannot determine individual risk or causality

  • Can lead to incorrect conclusions about individual behavior or disease risk

  • A limitation of study design, not a type of bias in measurement

500

A blood pressure cuff gives consistent but incorrect readings due to faulty calibration. This is an indication of...

What is high reliability with low validity?


Reliability is the consistency or reproducibility of a measurement—the extent to which the same result is obtained when the measurement is repeated under identical conditions.

Validity is the accuracy or truthfulness of a measurement—the extent to which a measure actually assesses what it is intended to measure.

Reliability = consistency; Validity = accuracy.

You cannot have validity without reliability, but you can have reliability without validity.

500

This regression model is commonly used in case–control studies to adjust for confounders and estimate an adjusted measure of association.

What is logistic regression (adjusted odds ratios)?