What are the behavior, psychological construct, and inference involved in the final exam?
Behavior - respond to multiple-choice questions
Psychological construct - knowledge of psych tests and measurement
Inference - how well (or poorly) an individual understands psych tests and measurement
What are the levels of measurement?
nominal, ordinal, interval, ratio
Reliability is about...
the consistency or precision of test scores
Validity is about...
the quality of inferences and decisions based on evidence
What are cut scores? Why are they problematic?
In a class setting, does rounding grades up solve the problem?
Cut scores are typically fairly arbitrary numbers that are used to separate groups of individuals (e.g., pass/fail, A/B/C/D/F). They are problematic because measurement is not perfectly reliable. People may be placed in the "wrong" group (i.e., their true score may be in a different group than their observed score). Rounding grades doesn't fix the problem. By rounding up, you increase the changes of placing students in the wrong bucket.
What are the 3 main characteristics/criteria of a good test?
1. the test representatively samples relevant behaviors
2. standardized testing conditions (e.g., same amount of time, same amount of resources available)
3. rules for scoring
What does a correlation tell you?
the direction and magnitude of variable relationships (e.g., If the correlation between conscientiousness and job performance is .3, there is a small, positive relationship between conscientiousness and job performance.)
Name at least four types of reliability.
test-retest reliability, inter-rater reliability, intra-rater reliability, parallel forms reliability, internal consistency reliability, alternate forms reliability, split-half reliability
What 3 questions are relevant to content validity?
Is the test content representative? Does it leave out anything important? Does it measure anything irrelevant?
Sometimes there are group differences in scores (e.g., men score higher than women on STEM exams on average). What are at least 3 possible explanations?
- test bias (it's the test's fault!)
- systemic societal problems (it's society's fault!)
- stereotypes and stigmas (it's society's fault!)
- genetics (it's genetics' fault! - no evidence)
Someone may score poorly on this exam. Based on their score, they may think they are not very intelligent. That would be a(n)...
individual inference, individual decision, institutional inference, institutional decision, rational inference, irrational inference, rational decision, irrational decision
individual inference and an irrational inference
If test scores fit a normal curve, what percentage of scores fall within one standard deviation of the mean? 2 standard deviations? 3 standard deviations?
~68%
~95%
~99.7
What is the reliability formula specified by Classical Test Theory? Explain each component of the formula.
Observed Score = True Score + Error
X = T + E
Observed score = score a person makes on a test
True score = the score an individual would receive if they took a test an infinite number of times and computed their average score (assuming no studying in between, testing fatigue, practice effects, etc.)
Error = random error (e.g., lucky/unlucky guessing)
What are convergent evidence of validity and discriminant evidence of validity?
Types of validity evidence based on relations with constructs (i.e., construct validity)
Convergent evidence of validity - test scores are strongly, positively associated with scores on tests measuring similar constructs
Discriminant evidence of validity - test scores are unrelated to scores on tests measuring dissimilar constructs
What is testing bias? What causes it? And what does it lead to?
Testing bias occurs when a group (or groups) of individuals are less likely to perform well on a test for reasons that have nothing to do with the construct being measured. Bias occurs when a test requires knowledge, skills, or abilities that are irrelevant to the construct being measured (e.g., requiring high-level vocabulary on a math test; including culturally-specific knowledge on an intelligence test). Bias results in differential prediction (i.e., the same score predicts outcomes differently for certain groups).
The quantitative (math) section of the SAT is a(n)...
Test of maximal performance, Behavior observation test, Self-report test, Standardized test, Nonstandardized test, Objective test, Projective test, Achievement test, Aptitude test, Intelligence test, Interest test, Personality test
Test of maximal performance, standardized test, objective test (except the writing portion)
Maybe: achievement, aptitude, or intelligence (debatable!)
At one point, the SAT stood for the Scholastic Aptitude Test. Now SAT does not stand for anything because it is not necessarily a perfect measure of aptitude. Because prep courses demonstrate success, it seems to be at least partially an achievement test.
Four people take a test. The population scores are 6, 6, 14, and 14. Calculate the standard deviation. Assume this is population data.
standard deviation = 4
Six people take a test. Their scores are 14, 8, 8, 10, 10, and 10. The test developer determines that Cronbach's alpha of the test is 0. One of the 6 test-takers scored an 8. What is the 95% confidence interval around his observed score?
The 95% confidence interval is 4 to 12.
What is a criterion?
An outcome we expect is associated with test scores.
For example, your performance on this jeopardy game may predict your Final exam score. Your Final Exam score would be the criterion.
What are some advantages to a career in assessment psychology?
- high demand
- high pay
- variety of fields
- variety of workplaces
Name at least 4 criteria of a good test/survey item.
1) The item is purposeful and straightforward.
2) The item is unambiguous and uses correct syntax (e.g., avoid jargon, complete sentences, comfortable reading level, no typos).
3) The item is appropriate for the rating scale.
4) The item does not require additional categorical alternatives.
5) The item asks one and only one question (not double-barreled).
6) The item does not require reverse-coding (debated!)
Suppose I took two driving tests: a written test and a driving test. Which test did I do better on? (Assume this is population data.)
Written test: Score = 20 Mean = 10 SD = 5
Driving test: Score = 70 Mean = 64 SD = 2
Written test: z-score = 2
Driving test: z-score = 3
Cheryl did better on the driving test
When conducting item analysis, testing professionals may examine Cronbach's alpha with item removed. What is that? When should it be examined? What might suggest you should remove an item?
What is that? Cronbach's alpha with item removed tells you the internal consistency reliability of a scale if an item is removed.
When should it be examined? When a scale is homogeneous (i.e., the items measure one underlying construct).
What might suggest you should remove an item? You may want to remove an item if Cronbach's alpha with item removed is high (well above .70). If Cronbach's alpha with an item removed is high, the scale would still have adequate internal consistency reliability without the item. Sometimes removing an item would increase Cronbach's alpha of the scale, which is a red flag for considering removal.
Suppose you know the answers to all of the questions in this jeopardy game, and you conclude you are a genius. Thoroughly evaluate relevant evidence, and explain the quality of the inference.
Evidence based on test content: The content does not representatively capture content relevant to being a genius (it only captures content relevant to psych tests and measurement). It leaves out a lot of things that are important (e.g., verbal reasoning, spatial intelligence, problem-solving). It measures things that are irrelevant (specific knowledge about tests and measurement).
Evidence based on relations with criteria: There is no evidence to suggest that performance on this jeopardy game is associated with genius outcomes (e.g., winning a Nobel peace prize, being deemed an expert in your field).
Evidence based on relations with constructs: There is no evidence to suggest that your performance on this jeopardy game is associated with your performance on an IQ test or another measure of genius-ness.
There is no evidence to suggest you are genius. (You might be, but your score on this jeopardy game is irrelevant.)
According to the American Psychological Association (APA), what are the 11 ethical principles relevant to assessments?
Bases of Assessments
Use of Assessments
Informed Consent in Assessments
Release of Test Data
Test Construction
Interpreting Assessment Results
Assessment by Unqualified Persons
Obsolete Tests and Outdated Test Results
Test Scoring and Interpretation Services
Explaining Assessment Results
Maintaining Test Security