CAIH Data Management 2018
Data Grab Bag
Database and CRF Design
Oops, my bad...
Study Design
100

What did the squares of toilet paper on the first day of class represent?

Bonus point: Who is using lots of these to potty train her daughters?

Individual pieces of data

Bonus: Chelsea W

100

What does HIPAA stand for?

The Health Insurance Portability and Accountablity Act

100

You are writing a description of a Serious Adverse Event (SAE) that happened in your study. Is the value you that you have entered quantitative or qualitative data?

Qualitative

100

This is the term we use when two different people enter the same data to prevent errors?

Double data entry

100

In what kind of study can you not standardize the way in which the data were collected?

Retrospective cohort study and ecological study

200

We query our database and find that someone's flight time to Baltimore was 3 times longer than the class mean! What do we call this? (Bonus point: which student is this?)

A data outlier 

Bonus: Chelsea M

200

Why do we specify a format for dates (e.g., DDMMYYYY)?

To collect data in a standardized manner and prevent misinterpretation

200

What is a strategy you might use to ensure participants are only asked questions that apply to them?

A skip logic pattern

200

You are about to leave for vacation and someone hands you a data query report with 5000 queries for "missing time of data entry"!  What do you recommend?

 Inquire whether important for study validity and if not, ignore.

200

What type of study has the best quality data regarding the exposure?

A randomized clinical trial

300

What data management activity uncovered the duplicate records for location (airport) between Sherry and Jennifer?

Data quality check

300

What is an important guideline for creating variable names?

Unique, short, and descriptive

300

Grab a piece of paper and draw a table. Hold it up when you're ready to describe the rows, columns and headers!

Rows are horizontal (data records), columns are vertical (variables), headers are the first row (variable names).

300

Maria Knoll is enrolled in your study. Her study ID should be K4586 but it was entered as K4568. What tool might you use to prevent this type of mistake?

A barcode scanner

300

You are doing an Ecological study and someone gives you the databases. What type of documentation do you ask for to go with it?

Data dictionary, protocol, standard operating procedures (SOPs)

400

One of your colleagues will be adding a ‘record’ to her family database very soon. What format is the variable used to collect data on when the baby will be born? 

(Bonus: Who will be a Grandmother soon?)

Date

Bonus: Sherry

400

Name a reason to de-identify data.

Protect participant confidentiality/ privacy.

400

Which common software is not recommended for large scale data management (large data set, many variables and data records)?

MS Excel (Sorry, Bill Gates)

400

Your data query found several people with age >200. What could you have done to prevent this kind of error?

Range restrictions in the front end

400

What type of study is like running a marathon and therefore might be likely to have missing data due to long follow up times and resulting participant loss to follow up? 

Bonus point: Who was looking for a long running route?

A prospective cohort study

Bonus: Rebecca

500

You are doing a head injury study and want to know what sports the participant plays. What kind of data response field would you advise to collect this data on the case report form?


Categorical

500

Name a type of error that double-data-entry catches and one that it cannot.

Catches typos; does not catch incorrect measurements, mistakes on the paper form

500

Give an example of the type of data quality checks you might do "on the back end" (done by an analyst)

Data queries (ranges, logic checks), missing value reports, etc.

500

List two challenges with collecting free text field data.

Time to record/write

Time and typos at data entry

Need to code before can analyze

500

What type of study may  provide lower quality data about participant risk factors or exposures? 

Case control study or retrospective study

600

What are key elements to include on a Study Activity Timeline?

Activities (enrollment, blood draw, clinical assessment, etc.) and when they happen

600

What is one way you can protect privacy using paper CRFs and another way that protects data on electronic records?  

Paper: use ID#, store in locked location, separate from analysis database

Electronic: Secure login, limit access, network security


600

Improve our CRF for this field:


13. Weight ______


Show number of digits and add units
600

Define both accuracy and efficiency, and give an example of a strategy that might increase both (other than barcode reader).

Efficiency - relative speed, ease at which something can be accomplished

Accuracy - a measure of correctness. 

Direct data capture onto tablet instead of double data entry increases both.

600

Why might collecting a lot of data make data quality worse?

More work to collect and clean less important data may lower quality of primary objective data