Final Exam Review Jeopardy Template

Basics of Psych Testing

Statistics

Reliability

Validity

Practical Considerations

100

What are the behavior, psychological construct, and inference involved in the final exam?

Behavior - respond to multiple-choice questions

Psychological construct - knowledge of psych tests and measurement

Inference - how well (or poorly) an individual understands psych tests and measurement

100

What are the levels of measurement?

nominal, ordinal, interval, ratio

100

Reliability is about...

the consistency or precision of test scores

100

Validity is about...

the quality of inferences and decisions based on evidence

100

What are cut scores? Why are they problematic?

In a class setting, does rounding grades up solve the problem?

Cut scores are typically fairly arbitrary numbers that are used to separate groups of individuals (e.g., pass/fail, A/B/C/D/F). They are problematic because measurement is not perfectly reliable. People may be placed in the "wrong" group (i.e., their true score may be in a different group than their observed score). Rounding grades doesn't fix the problem. By rounding up, you increase the changes of placing students in the wrong bucket.

200

What are the 3 main characteristics/criteria of a good test?

1. the test representatively samples relevant behaviors

2. standardized testing conditions (e.g., same amount of time, same amount of resources available)

3. rules for scoring

200

What does a correlation tell you?

the direction and magnitude of variable relationships (e.g., If the correlation between conscientiousness and job performance is .3, there is a small, positive relationship between conscientiousness and job performance.)

200

Name at least four types of reliability.

test-retest reliability, inter-rater reliability, intra-rater reliability, parallel forms reliability, internal consistency reliability, alternate forms reliability, split-half reliability

200

What 3 questions are relevant to content validity?

Is the test content representative? Does it leave out anything important? Does it measure anything irrelevant?

200

Sometimes there are group differences in scores (e.g., men score higher than women on STEM exams on average). What are at least 3 possible explanations?

- test bias (it's the test's fault!)

- systemic societal problems (it's society's fault!)

- stereotypes and stigmas (it's society's fault!)

- genetics (it's genetics' fault! - no evidence)

300

Someone may score poorly on this exam. Based on their score, they may think they are not very intelligent. That would be a(n)...

individual inference, individual decision, institutional inference, institutional decision, rational inference, irrational inference, rational decision, irrational decision

individual inference and an irrational inference

300

If test scores fit a normal curve, what percentage of scores fall within one standard deviation of the mean? 2 standard deviations? 3 standard deviations?

~68%

~95%

~99.7

300

What is the reliability formula specified by Classical Test Theory? Explain each component of the formula.

Observed Score = True Score + Error

X = T + E

Observed score = score a person makes on a test

True score = the score an individual would receive if they took a test an infinite number of times and computed their average score (assuming no studying in between, testing fatigue, practice effects, etc.)

Error = random error (e.g., lucky/unlucky guessing)

300

What are convergent evidence of validity and discriminant evidence of validity?

Types of validity evidence based on relations with constructs (i.e., construct validity)

Convergent evidence of validity - test scores are strongly, positively associated with scores on tests measuring similar constructs

Discriminant evidence of validity - test scores are unrelated to scores on tests measuring dissimilar constructs

300

What is testing bias? What causes it? And what does it lead to?

Testing bias occurs when a group (or groups) of individuals are less likely to perform well on a test for reasons that have nothing to do with the construct being measured. Bias occurs when a test requires knowledge, skills, or abilities that are irrelevant to the construct being measured (e.g., requiring high-level vocabulary on a math test; including culturally-specific knowledge on an intelligence test). Bias results in differential prediction (i.e., the same score predicts outcomes differently for certain groups).

400

The quantitative (math) section of the SAT is a(n)...

Test of maximal performance, Behavior observation test, Self-report test, Standardized test, Nonstandardized test, Objective test, Projective test, Achievement test, Aptitude test, Intelligence test, Interest test, Personality test

Test of maximal performance, standardized test, objective test (except the writing portion)

Maybe: achievement, aptitude, or intelligence (debatable!)

At one point, the SAT stood for the Scholastic Aptitude Test. Now SAT does not stand for anything because it is not necessarily a perfect measure of aptitude. Because prep courses demonstrate success, it seems to be at least partially an achievement test.

400

Four people take a test. The population scores are 6, 6, 14, and 14. Calculate the standard deviation. Assume this is population data.

standard deviation = 4

400

Six people take a test. Their scores are 14, 8, 8, 10, 10, and 10. The test developer determines that Cronbach's alpha of the test is 0. One of the 6 test-takers scored an 8. What is the 95% confidence interval around his observed score?

The 95% confidence interval is 4 to 12.

400

What is a criterion?

An outcome we expect is associated with test scores.

For example, your performance on this jeopardy game may predict your Final exam score. Your Final Exam score would be the criterion.

400

What are some advantages to a career in assessment psychology?

- high demand

- high pay

- variety of fields

- variety of workplaces

500

Name at least 4 criteria of a good test/survey item.

1) The item is purposeful and straightforward.

2) The item is unambiguous and uses correct syntax (e.g., avoid jargon, complete sentences, comfortable reading level, no typos).

3) The item is appropriate for the rating scale.

4) The item does not require additional categorical alternatives.

5) The item asks one and only one question (not double-barreled).

6) The item does not require reverse-coding (debated!)

500

Suppose I took two driving tests: a written test and a driving test. Which test did I do better on? (Assume this is population data.)

Written test: Score = 20 Mean = 10 SD = 5

Driving test: Score = 70 Mean = 64 SD = 2

Written test: z-score = 2

Driving test: z-score = 3

Cheryl did better on the driving test

500

When conducting item analysis, testing professionals may examine Cronbach's alpha with item removed. What is that? When should it be examined? What might suggest you should remove an item?

What is that? Cronbach's alpha with item removed tells you the internal consistency reliability of a scale if an item is removed.

When should it be examined? When a scale is homogeneous (i.e., the items measure one underlying construct).

What might suggest you should remove an item? You may want to remove an item if Cronbach's alpha with item removed is high (well above .70). If Cronbach's alpha with an item removed is high, the scale would still have adequate internal consistency reliability without the item. Sometimes removing an item would increase Cronbach's alpha of the scale, which is a red flag for considering removal.

500

Suppose you know the answers to all of the questions in this jeopardy game, and you conclude you are a genius. Thoroughly evaluate relevant evidence, and explain the quality of the inference.

Evidence based on test content: The content does not representatively capture content relevant to being a genius (it only captures content relevant to psych tests and measurement). It leaves out a lot of things that are important (e.g., verbal reasoning, spatial intelligence, problem-solving). It measures things that are irrelevant (specific knowledge about tests and measurement).

Evidence based on relations with criteria: There is no evidence to suggest that performance on this jeopardy game is associated with genius outcomes (e.g., winning a Nobel peace prize, being deemed an expert in your field).

Evidence based on relations with constructs: There is no evidence to suggest that your performance on this jeopardy game is associated with your performance on an IQ test or another measure of genius-ness.

There is no evidence to suggest you are genius. (You might be, but your score on this jeopardy game is irrelevant.)

500

According to the American Psychological Association (APA), what are the 11 ethical principles relevant to assessments?

Bases of Assessments

Use of Assessments

Informed Consent in Assessments

Release of Test Data

Test Construction

Interpreting Assessment Results

Assessment by Unqualified Persons

Obsolete Tests and Outdated Test Results

Test Scoring and Interpretation Services

Explaining Assessment Results

Maintaining Test Security