Assessment scenarios
Test purpose & usefulness
Defining constructs
Test Specs
Items, MCQs & Feedback
100

In Scenario 1, this type of quiz begins class and is not recorded.

What is a pop reading quiz (formative, 10 short-answer items)?

100

Bachman & Palmer’s umbrella concept asking “Does the test do what it’s meant to do?”

What is test usefulness?

100

“Students will learn tag questions” is not assessable until you state this clearly.

What is the performance/construct (what learners can DO, in which mode/context)?

100

“Skills assessed, item types, tasks, scoring, and reporting” collectively form this plan.

What are test specifications (specs)?

100

The correct MCQ choice is the key; all others are called these.

What are distractors?

200

Scenario 2’s course focuses on this language domain and ends with MCQ, cloze, and editing.

What is a grammar unit test on verb tenses?

200

This impact focuses on how tests positively influence teaching/learning before/after.

What is beneficial washback?

200

For Scenario 2, content validity suggests sampling these two broad performance types.

What are comprehension and production?

200

Classroom specs should align with this validity principle by mirroring course tasks.

What is content validity (representativeness)?

200

MCQs are tempting for practicality/reliability but mainly test this kind of knowledge.

What is recognition/selective response?

300

Scenario 3 values this over quantity during a 90-minute in-class task with later peer conferences.

What is quality of writing in a midterm essay?

300

One checklist question asks if results should be used to judge these predetermined targets.

What are curricular standards?

300

Give one example of a precise performance statement for the tense unit.

What is “recognize written present perfect forms in familiar contexts” (or “produce oral simple past in familiar contexts”)?

300

Scenario 3 specs include stating these four rubric dimensions on the prompt.

What are content, organization, rhetorical discourse, grammar/mechanics?

300

This index is simply the % who got an item right (ideal band ≈ .15–.85).

What is item facility (IF)?

400

Scenario 4 combines a 20-minute listening section with this 3-minute task per student.

What is a one-on-one oral interview?

400

Deciding not to test because it’s just “Week 3 Friday” violates this planning step.

What is determining purpose (don’t test by habit; test for a reason)?

400

Listing which verbs/regular vs. irregular forms will be covered belongs here, not as vague goals.

What are detailed construct specifications (refining constructs)?

400

In multi-skill exams (Scenario 4), specs often include these child-friendly stimuli.

What are pictures/picture-cued tasks?

400

This index shows how well an item separates high/low performers (closer to 1.0 = better).

What is item discrimination (ID)?

500

Name two reasons Scenario 1’s quiz offers beneficial washback despite no grade.

What are (a) self-assessment of comprehension and (b) a springboard for teacher-led discussion?

500

Name three distinct stakeholders/targets that the purpose/usefulness checklist explicitly considers.

What are students, the teacher’s next teaching steps, and the course’s significance (relative weighting/impact)?

500

Why might you exclude some objectives from one test, even if taught?

What are practical constraints (time/weighting), prior informal evidence, and focus on priority constructs?

500

Give two practical decisions specs force you to make for Scenario 2’s grammar test.

What are (a) which subskills to include/weight and (b) how to elicit/score (e.g., record oral vs. skip due to practicality)?

500

Name two red flags for weak distractor efficiency.

What are (a) a distractor no one chooses and (b) distractors attracting stronger students more than weaker ones?