What did the squares of toilet paper on the first day of class represent?
Bonus point: Who is using lots of these to potty train her daughters?
Individual pieces of data
Bonus: Chelsea W
What does HIPAA stand for?
The Health Insurance Portability and Accountablity Act
You are writing a description of a Serious Adverse Event (SAE) that happened in your study. Is the value you that you have entered quantitative or qualitative data?
Qualitative
This is the term we use when two different people enter the same data to prevent errors?
Double data entry
In what kind of study can you not standardize the way in which the data were collected?
Retrospective cohort study and ecological study
We query our database and find that someone's flight time to Baltimore was 3 times longer than the class mean! What do we call this? (Bonus point: which student is this?)
A data outlier
Bonus: Chelsea M
Why do we specify a format for dates (e.g., DDMMYYYY)?
To collect data in a standardized manner and prevent misinterpretation
What is a strategy you might use to ensure participants are only asked questions that apply to them?
A skip logic pattern
You are about to leave for vacation and someone hands you a data query report with 5000 queries for "missing time of data entry"! What do you recommend?
Inquire whether important for study validity and if not, ignore.
What type of study has the best quality data regarding the exposure?
A randomized clinical trial
What data management activity uncovered the duplicate records for location (airport) between Sherry and Jennifer?
Data quality check
What is an important guideline for creating variable names?
Unique, short, and descriptive
Grab a piece of paper and draw a table. Hold it up when you're ready to describe the rows, columns and headers!
Rows are horizontal (data records), columns are vertical (variables), headers are the first row (variable names).
Maria Knoll is enrolled in your study. Her study ID should be K4586 but it was entered as K4568. What tool might you use to prevent this type of mistake?
A barcode scanner
You are doing an Ecological study and someone gives you the databases. What type of documentation do you ask for to go with it?
Data dictionary, protocol, standard operating procedures (SOPs)
One of your colleagues will be adding a ‘record’ to her family database very soon. What format is the variable used to collect data on when the baby will be born?
(Bonus: Who will be a Grandmother soon?)
Date
Bonus: Sherry
Name a reason to de-identify data.
Protect participant confidentiality/ privacy.
Which common software is not recommended for large scale data management (large data set, many variables and data records)?
MS Excel (Sorry, Bill Gates)
Your data query found several people with age >200. What could you have done to prevent this kind of error?
Range restrictions in the front end
What type of study is like running a marathon and therefore might be likely to have missing data due to long follow up times and resulting participant loss to follow up?
Bonus point: Who was looking for a long running route?
A prospective cohort study
Bonus: Rebecca
You are doing a head injury study and want to know what sports the participant plays. What kind of data response field would you advise to collect this data on the case report form?
Categorical
Name a type of error that double-data-entry catches and one that it cannot.
Catches typos; does not catch incorrect measurements, mistakes on the paper form
Give an example of the type of data quality checks you might do "on the back end" (done by an analyst)
Data queries (ranges, logic checks), missing value reports, etc.
List two challenges with collecting free text field data.
Time to record/write
Time and typos at data entry
Need to code before can analyze
What type of study may provide lower quality data about participant risk factors or exposures?
Case control study or retrospective study
What are key elements to include on a Study Activity Timeline?
Activities (enrollment, blood draw, clinical assessment, etc.) and when they happen
What is one way you can protect privacy using paper CRFs and another way that protects data on electronic records?
Paper: use ID#, store in locked location, separate from analysis database
Electronic: Secure login, limit access, network security
Improve our CRF for this field:
13. Weight ______
Define both accuracy and efficiency, and give an example of a strategy that might increase both (other than barcode reader).
Efficiency - relative speed, ease at which something can be accomplished
Accuracy - a measure of correctness.
Direct data capture onto tablet instead of double data entry increases both.
Why might collecting a lot of data make data quality worse?
More work to collect and clean less important data may lower quality of primary objective data