This task typically takes ~80% of an analytics budget and timeline.
What is data tidying? (also OK: cleaning; ETL, wrangling, etc.)
The two basic building blocks of market analysis in health care
What are incidence and prevalence?
A prediction without a subsequent ___________ Is meaningless.
What is an intervention? (also OK: action)
Dose-response, strength of association, biological plausibility, and consistency of findings
What are the Bradford Hill criteria of causality?
Often in health care there are no good options, only tough _______________
What are tradeoffs?
The “three legs” of the data asset appraisal triangle
What are use case, data asset, and commercialization?
The extent to which a measure adequately captures the intended concept.
What is (measurement) validity?
Sensitivity/specificity are properties of a screening test, while positive/negative predictive values are also influenced by _______________?
What is prevalence?
The statistical tendency of extreme values to become less extreme over time
What is regression to the mean?
False positives versus false ________________
What is negatives?
ETL (data cleaning); row differentiation; column differentiation
What are the (three) sources of data differentiation?
All else equal, when a treatment is developed that prolongs the life of people suffering from a disease, over time prevalence will ______________.
What is increase?
The proportion of true negatives who correctly test negative
What is specificity?
The best protection against regression-to-the-mean
What is the use of a CONTROL GROUP? (also OK: benchmark)
_________ versus scientific rigor
What is speed?
The underlying statistical/data science conceptual framework for data quality
What is the Total Survey Error Framework? (Also OK: What is the Total Error Framework?)
Biggest long-term threat to herd immunity in the U.S.
What is vaccine hesitancy?
The “three T’s” of predictive model diligence
What are timing diagram, training data, and two x two?
The three C’s of treatment study quality
What are Comparative control, Chance, and Context
Control versus __________
What is context?
The guiding question “Why were these data generated?”
What is BEST SOURCE OF INSIGHT (also OK: BIGGEST CLUE) about a data asset’s core differentiator
The three drivers of Ro
What are: Contact rate; probability of transmission per contact (infectiousness); duration of infectiousness?
Health care contexts that demand minimizing false positives
What are resource-constrained settings?
Three ways to minimize the role of chance
What are: Large sample sizes; statistical significance; pre-registration of primary endpoint?
Data innovation versus _____________
What is data privacy?