This type of test comes with a manual, fixed directions, and uniform scoring so every examinee gets the same procedure.
What is a standardized test?
The term for consistency of scores across time, raters, or forms.
What is reliability?
Experts judge whether items cover the domain thoroughly to support this type of validity.
What is content validity?
In a left-skewed distribution, place these three in order from smallest to largest: mean, median, mode.
What are mean < median < mode?
Roughly this percent of scores fall within ±1 SD, ±2 SD, and ±3 SD of the mean
What are about 68%, 95%, and 99.7%?
A spelling quiz with clear right/wrong items is this test format.
What is an objective test?
Two therapists scoring the same session independently checks this kind of reliability.
What is inter-rater reliability?
A test that appears to measure what it claims, helping clients accept it, has this non-technical form of validity.
What is face validity?
The most appropriate center for a heavily skewed distribution with outliers.
What is the median?
Convert a raw score of 92 when M = 80 and SD = 6 into a z score.
What is z = 2.00 (92−80)/6=12/6=2(92 − 80)/6 = 12/6 = 2(92−80)/6=12/6=2?
A driver’s road test where you must meet a preset passing standard is this interpretation method.
What is criterion-referenced testing?
Giving the same instrument twice and correlating the two totals checks this.
What is test-retest reliability?
Correlating a clinic score with a later real-world outcome (e.g., home performance) evaluates this evidence.
What is criterion predictive validity?
This plot for continuous data uses adjacent bars to display a frequency distribution.
What is a histogram?
A T score is defined as 50 + 10z. What T corresponds to z = −0.5?
What is T = 45 (50 + 10×−0.5)?
A percentile rank from a reading test comparing students to a national sample is using this interpretation.
What is norm-referenced testing?
Cronbach’s alpha estimates this kind of reliability based on item inter-correlations.
What is internal consistency reliability?
Correlating a new measure with a gold standard at the same time supports this.
hat is criterion concurrent validity?
The middle 50% of scores appears as the “box” in this graphic.
What is a box-and-whisker plot (the IQR)?
With M = 70 and SD = 8, the scores that bracket about 68% of the population are these.
What are 62 and 78 (70 ± 8)?
Giving extra time to a client without altering what the test measures is called this.
What is an accommodation?
Two equivalent versions of a test given to the same group on the same day examine this.
What is parallel-forms reliability?
Evidence that a test truly reflects the intended construct (via theory, structure, group differences) pertains to this.
What is construct validity?
A much longer right whisker than left indicates this shape.
What is positive (right) skew?
Approximately what proportion of scores lie above z = +1?
What is about 15.87%?
Changing test content so the construct is no longer the same (e.g., removing items that define the skill) is called this, and it breaks norms.
What is a modification?
In Classical Test Theory, the formula that explains any observed score.
What is Observed = True + Error?
This kind of validity lets findings be applied to new settings or groups.
What is external validity (generalizability)?
“More peaked than normal” is this kurtosis term.
What is leptokurtic?
The area between z = +1 and z = +2 under the normal curve is about this percent.
What is about 13.59%?
The professional who is ethically responsible for ensuring an assessment meets standards for special populations in OT practice.
Who is the OTR?
The typical minimum reliability coefficient acceptable for group decisions.
What is .70?
The default claim of “no difference” or “no relationship” is known as this hypothesis.
What is the null hypothesis?
Compute the range of 18, 21, 25, 29, 40.
What is 22 (40 − 18)?
If T = 60, the equivalent z is this.
What is z = 1.0 (since 60 = 50 + 10z)?
The 2014 Standards for Educational and Psychological Testing emphasize this core principle for all examinees.
What is fairness/equity in testing?
This statistic estimates the average imprecision around one person’s observed score.
What is the Standard Error of Measurement (SEM)?
“Group A will score higher than Group B” is this kind of hypothesis and generally uses this tailing.
What is a directional hypothesis; what is a one-tailed test?
Order 3.0, 4.5, 2.0, 9.0, 6.0 and give the median.
What is 4.5 (2.0, 3.0, 4.5, 6.0, 9.0)?
Two data sets have SDs of 6 and 12, but different means (30 and 100). Which shows more relative variability? (Use CV = SD/Mean.)
What is the first (CV = 6/30 = .20) vs the second (12/100 = .12); the first is larger.
The main purpose of a test manual is to preserve this across administrations.
What is standardization (uniform administration and scoring)?
If SEM = 3 and a client’s observed score is 41, a simple 68% band is between these two numbers.
What are 38 and 44 (41 ± 3)?
A study that simply asks, “Are these groups different?” with no direction specified uses this tailing.
What is a two-tailed test?
The best spread statistic to report with the mean for bell-shaped data.
What is the standard deviation?
These tests (t, ANOVA, Pearson r) belong to this family that assumes normality and scale data.
What are parametric statistics?
A classroom essay graded with a rubric by different teachers is less this than a multiple-choice test unless rater training is strong.
What is less objective?
The most serious clinical consequence of poor reliability is that results might reflect this instead of true ability.
What is measurement error (or something other than true ability)?
Inference uses a sample to make statements about this.
What is the population?
This measure of center is most sensitive to extreme values.
What is the mean?
If z = −1.0, the raw score is one SD in this direction from the mean.
What is one standard deviation below the mean?