Data
Describes
Predicts
Evaluates
Tradeoffs!
100

This task typically takes ~80% of an analytics budget and timeline.

What is data tidying? (also OK: cleaning; ETL, wrangling, etc.)

100

The two basic building blocks of market analysis in health care

What are incidence and prevalence?

100

A prediction without a subsequent ___________ Is meaningless.

What is an intervention? (also OK: action)

100

Dose-response, strength of association, biological plausibility, and consistency of findings

What are the Bradford Hill criteria of causality?

100

Often in health care there are no good options, only tough _______________

What are tradeoffs?

200

The “three legs” of the data asset appraisal triangle

What are use case, data asset, and commercialization?

200

The extent to which a measure adequately captures the intended concept.

What is (measurement) validity?

200

Sensitivity/specificity are properties of a screening test, while positive/negative predictive values are also influenced by _______________?

What is prevalence?

200

The statistical tendency of extreme values to become less extreme over time

What is regression to the mean?

200

False positives versus false ________________

What is negatives?

300

ETL (data cleaning); row differentiation; column differentiation

What are the (three) sources of data differentiation?

300

All else equal, when a treatment is developed that prolongs the life of people suffering from a disease, over time prevalence will ______________.

What is increase?

300

The proportion of true negatives who correctly test negative

What is specificity?

300

The best protection against regression-to-the-mean

What is the use of a CONTROL GROUP? (also OK: benchmark)

300

_________ versus scientific rigor

What is speed?

400

The underlying statistical/data science conceptual framework for data quality

What is the Total Survey Error Framework? (Also OK: What is the Total Error Framework?)

400

Biggest long-term threat to herd immunity in the U.S.

What is vaccine hesitancy?

400

The “three T’s” of predictive model diligence

What are timing diagram, training data, and two x two?

400

The three C’s of treatment study quality

What are Comparative control, Chance, and Context

400

Control versus __________

What is context?

500

The guiding question “Why were these data generated?”

What is BEST SOURCE OF INSIGHT (also OK: BIGGEST CLUE) about a data asset’s core differentiator

500

The three drivers of Ro

What are: Contact rate; probability of transmission per contact (infectiousness); duration of infectiousness?

500

Health care contexts that demand minimizing false positives

What are resource-constrained settings?

500

Three ways to minimize the role of chance

What are: Large sample sizes; statistical significance; pre-registration of primary endpoint?

500

Data innovation versus _____________

What is data privacy?

M
e
n
u