What are the behavior, psychological construct, and inference involved in the final exam?
Behavior - respond to multiple-choice questions
Psychological construct - knowledge of psych tests and measurement
Inference - how well (or poorly) an individual understands psych tests and measurement
What are three measures of central tendency and three measures of variability?
Central tendency (the middle of the distribution) - mean, median, mode
Variability (how much scores differ) - range, standard deviation, variance
When conducting item analysis, testing professionals may examine Cronbach's alpha with item removed. What is that? When should it be examined? What might suggest you should remove an item?
What is that? Cronbach's alpha with item removed tells you the internal consistency reliability of a scale if an item is removed.
When should it be examined? When a scale is homogeneous (i.e., the items measure one underlying construct).
What might suggest you should remove an item? You may want to remove an item if Cronbach's alpha with item removed is high (well above .70). If Cronbach's alpha with an item removed is high, the scale would still have adequate internal consistency reliability without the item. Sometimes removing an item would increase Cronbach's alpha of the scale, which is a red flag for considering removal.
Reliability is about...
the consistency or precision of test scores
Validity is about...
the quality of inferences and decisions based on evidence
What are the 3 main characteristics/criteria of a good test?
1. the test representatively samples relevant behaviors
2. standardized testing conditions (e.g., same amount of time, same amount of resources available)
3. rules for scoring
What does a correlation tell you?
the direction and magnitude of variable relationships (e.g., If the correlation between conscientiousness and job performance is .3, there is a small, positive relationship between conscientiousness and job performance.)
What 3 questions are relevant to content validity?
Is the test content representative? Does it leave out anything important? Does it measure anything irrelevant?
Suppose PAR is hiring assessment psychologists based on their conscientiousness. PAR hires a psychologist who is high in conscientiousness, but he turns out to be a bad employee. He is a...
true positive, false positive, true negative, false negative
false positive
Someone may score poorly on this exam. Based on their score, they may think they are not very intelligent. That would be a(n)...
individual inference, individual decision, institutional inference, institutional decision, rational inference, irrational inference, rational decision, irrational decision
individual inference and an irrational inference
Name at least 4 criteria of a good test/survey item.
1) The item is purposeful and straightforward.
2) The item is unambiguous and uses correct syntax (e.g., avoid jargon, complete sentences, comfortable reading level, no typos).
3) The item is appropriate for the rating scale.
4) The item does not require additional categorical alternatives.
5) The item asks one and only one question (not double-barreled).
6) The item does not require reverse-coding (debated!)
Suppose I took two driving tests: a written test and a driving test. Which test did I do better on? (Assume this is population data.)
Written test: Score = 20 Mean = 10 SD = 5
Driving test: Score = 70 Mean = 64 SD = 2
Written test: z-score = 2
Driving test: z-score = 3
Cheryl did better on the driving test
What is a criterion?
An outcome we expect is associated with test scores.
For example, your performance on this jeopardy game may predict your Final exam score. Your Final Exam score would be the criterion.
Name at least four types of reliability.
test-retest reliability, inter-rater reliability, intra-rater reliability, parallel forms reliability, internal consistency reliability, alternate forms reliability, split-half reliability
The quantitative (math) section of the SAT is a(n)...
Test of maximal performance, Behavior observation test, Self-report test, Standardized test, Nonstandardized test, Objective test, Projective test, Achievement test, Aptitude test, Intelligence test, Interest test, Personality test
Test of maximal performance, standardized test, objective test (except the writing portion)
Maybe: achievement, aptitude, or intelligence (debatable!)
At one point, the SAT stood for the Scholastic Aptitude Test. Now SAT does not stand for anything because it is not necessarily a perfect measure of aptitude. Because prep courses demonstrate success, it seems to be at least partially an achievement test.
What are the levels of measurement?
nominal, ordinal, interval, ratio
What is the reliability formula specified by Classical Test Theory? Explain each component of the formula.
Observed Score = True Score + Error
X = T + E
Observed score = score a person makes on a test
True score = the score an individual would receive if they took a test an infinite number of times and computed their average score (assuming no studying in between, testing fatigue, practice effects, etc.)
Error = random error (e.g., lucky/unlucky guessing)
What are convergent evidence of validity and discriminant evidence of validity?
Types of validity evidence based on relations with constructs (i.e., construct validity)
Convergent evidence of validity - test scores are strongly, positively associated with scores on tests measuring similar constructs
Discriminant evidence of validity - test scores are unrelated to scores on tests measuring dissimilar constructs
What are the 6 assumptions of a psychological test?
1.The test measures what it claims to measure and predicts what it claims to predict
2. Test scores will typically remain stable over time (test-retest reliability)
3. Individuals understand items in the same way.
4. Individuals will report accurately.
5. Individuals will report honestly.
6. There will be some error. Observed score = True score + error
According to the American Psychological Association (APA), what are the 11 ethical principles relevant to assessments?
Bases of Assessments
Use of Assessments
Informed Consent in Assessments
Release of Test Data
Test Construction
Interpreting Assessment Results
Assessment by Unqualified Persons
Obsolete Tests and Outdated Test Results
Test Scoring and Interpretation Services
Explaining Assessment Results
Maintaining Test Security
If test scores fit a normal curve, what percentage of scores fall within one standard deviation of the mean? 2 standard deviations? 3 standard deviations?
~68%
~95%
~99.7
Four people take a test. The population scores are 6, 6, 14, and 14.
Calculate/identify the mean, median, mode, standard deviation, and variance.
mean = 10; median = 10; modes = 6 and 14; standard deviation = 4; variance = 16
What is testing bias? What causes it? And what does it lead to?
Testing bias occurs when a group (or groups) of individuals are less likely to perform well on a test for reasons that have nothing to do with the construct being measured. Bias occurs when a test requires knowledge, skills, or abilities that are irrelevant to the construct being measured (e.g., requiring high-level vocabulary on a math test; including culturally-specific knowledge on an intelligence test). Bias results in differential prediction (i.e., the same score predicts outcomes differently for certain groups).
Six people take a test. Their scores are 14, 8, 8, 10, 10, and 10. The test developer determines that Cronbach's alpha of the test is 0. One of the 6 test-takers scored an 8. What is the 95% confidence interval around his observed score?
The 95% confidence interval is 4 to 12.
Suppose you know the answers to all of the questions in this jeopardy game, and you conclude you are a genius. Thoroughly evaluate relevant evidence, and explain the quality of the inference.
Evidence based on test content: The content does not representatively capture content relevant to being a genius (it only captures content relevant to psych tests and measurement). It leaves out a lot of things that are important (e.g., verbal reasoning, spatial intelligence, problem-solving). It measures things that are irrelevant (specific knowledge about tests and measurement).
Evidence based on relations with criteria: There is no evidence to suggest that performance on this jeopardy game is associated with genius outcomes (e.g., winning a Nobel peace prize, being deemed an expert in your field).
Evidence based on relations with constructs: There is no evidence to suggest that your performance on this jeopardy game is associated with your performance on an IQ test or another measure of genius-ness.
There is no evidence to suggest you are genius. (You might be, but your score on this jeopardy game is irrelevant.)