Data Analysis Jeopardy

Test Types and Administration

Reliability & Measurement Error

Validity & Hypotheses & Inference

Descriptive Stats & Distributions

Normal Curve, z/T Scores & Probability

100

This type of test comes with a manual, fixed directions, and uniform scoring so every examinee gets the same procedure.

What is a standardized test?

100

The term for consistency of scores across time, raters, or forms.

What is reliability?

100

Experts judge whether items cover the domain thoroughly to support this type of validity.

What is content validity?

100

In a left-skewed distribution, place these three in order from smallest to largest: mean, median, mode.

What are mean < median < mode?

100

Roughly this percent of scores fall within ±1 SD, ±2 SD, and ±3 SD of the mean

What are about 68%, 95%, and 99.7%?

200

A spelling quiz with clear right/wrong items is this test format.

What is an objective test?

200

Two therapists scoring the same session independently checks this kind of reliability.

What is inter-rater reliability?

200

A test that appears to measure what it claims, helping clients accept it, has this non-technical form of validity.

What is face validity?

200

The most appropriate center for a heavily skewed distribution with outliers.

What is the median?

200

Convert a raw score of 92 when M = 80 and SD = 6 into a z score.

What is z = 2.00 (92−80)/6=12/6=2(92 − 80)/6 = 12/6 = 2(92−80)/6=12/6=2?

300

A driver’s road test where you must meet a preset passing standard is this interpretation method.

What is criterion-referenced testing?

300

Giving the same instrument twice and correlating the two totals checks this.

What is test-retest reliability?

300

Correlating a clinic score with a later real-world outcome (e.g., home performance) evaluates this evidence.

What is criterion predictive validity?

300

This plot for continuous data uses adjacent bars to display a frequency distribution.

What is a histogram?

300

A T score is defined as 50 + 10z. What T corresponds to z = −0.5?

What is T = 45 (50 + 10×−0.5)?

400

A percentile rank from a reading test comparing students to a national sample is using this interpretation.

What is norm-referenced testing?

400

Cronbach’s alpha estimates this kind of reliability based on item inter-correlations.

What is internal consistency reliability?

400

Correlating a new measure with a gold standard at the same time supports this.

hat is criterion concurrent validity?

400

The middle 50% of scores appears as the “box” in this graphic.

What is a box-and-whisker plot (the IQR)?

400

With M = 70 and SD = 8, the scores that bracket about 68% of the population are these.

What are 62 and 78 (70 ± 8)?

500

Giving extra time to a client without altering what the test measures is called this.

What is an accommodation?

500

Two equivalent versions of a test given to the same group on the same day examine this.

What is parallel-forms reliability?

500

Evidence that a test truly reflects the intended construct (via theory, structure, group differences) pertains to this.

What is construct validity?

500

A much longer right whisker than left indicates this shape.

What is positive (right) skew?

500

Approximately what proportion of scores lie above z = +1?

What is about 15.87%?

600

Changing test content so the construct is no longer the same (e.g., removing items that define the skill) is called this, and it breaks norms.

What is a modification?

600

In Classical Test Theory, the formula that explains any observed score.

What is Observed = True + Error?

600

This kind of validity lets findings be applied to new settings or groups.

What is external validity (generalizability)?

600

“More peaked than normal” is this kurtosis term.

What is leptokurtic?

600

The area between z = +1 and z = +2 under the normal curve is about this percent.

What is about 13.59%?

700

The professional who is ethically responsible for ensuring an assessment meets standards for special populations in OT practice.

Who is the OTR?

700

The typical minimum reliability coefficient acceptable for group decisions.

What is .70?

700

The default claim of “no difference” or “no relationship” is known as this hypothesis.

What is the null hypothesis?

700

Compute the range of 18, 21, 25, 29, 40.

What is 22 (40 − 18)?

700

If T = 60, the equivalent z is this.

What is z = 1.0 (since 60 = 50 + 10z)?

800

The 2014 Standards for Educational and Psychological Testing emphasize this core principle for all examinees.

What is fairness/equity in testing?

800

This statistic estimates the average imprecision around one person’s observed score.

What is the Standard Error of Measurement (SEM)?

800

“Group A will score higher than Group B” is this kind of hypothesis and generally uses this tailing.

What is a directional hypothesis; what is a one-tailed test?

800

Order 3.0, 4.5, 2.0, 9.0, 6.0 and give the median.

What is 4.5 (2.0, 3.0, 4.5, 6.0, 9.0)?

800

Two data sets have SDs of 6 and 12, but different means (30 and 100). Which shows more relative variability? (Use CV = SD/Mean.)

What is the first (CV = 6/30 = .20) vs the second (12/100 = .12); the first is larger.

900

The main purpose of a test manual is to preserve this across administrations.

What is standardization (uniform administration and scoring)?

900

If SEM = 3 and a client’s observed score is 41, a simple 68% band is between these two numbers.

What are 38 and 44 (41 ± 3)?

900

A study that simply asks, “Are these groups different?” with no direction specified uses this tailing.

What is a two-tailed test?

900

The best spread statistic to report with the mean for bell-shaped data.

What is the standard deviation?

900

These tests (t, ANOVA, Pearson r) belong to this family that assumes normality and scale data.

What are parametric statistics?

1000

A classroom essay graded with a rubric by different teachers is less this than a multiple-choice test unless rater training is strong.

What is less objective?

1000

The most serious clinical consequence of poor reliability is that results might reflect this instead of true ability.

What is measurement error (or something other than true ability)?

1000

Inference uses a sample to make statements about this.

What is the population?

1000

This measure of center is most sensitive to extreme values.

What is the mean?

1000

If z = −1.0, the raw score is one SD in this direction from the mean.

What is one standard deviation below the mean?