The Way To Better MCQs
To quote my colleague, Julie Dirksen, multiple-choice questions (MCQs) are devilishly difficult to design appropriately. She’s correct! Here are a few of the top flaws research finds in typical multiple-choice questions:
- More than one or no unambiguously correct answer
- Unnecessary information
- Unclear or confusing wording
- Cues to the correct answer
- Answer choices that perform poorly
Spot The Flaws In The Design Of Multiple-Choice Questions
Research on multiple-choice questions finds most MCQs contain one or more flaws in the design, reducing the quality of the assessment and placing test writers, test-takers, and organizations at risk.
For example, the following MCQ has more than one flaw:
Maria, a member of the response team, receives a call about a person unresponsive on the floor outside the elevator on Floor 3. Maria gets to the person first and begins cardiopulmonary resuscitation (CPR). Which of the following PPE is not needed when doing CPR? (Select the correct answer.)
- Disposable gloves
- Barrier device
- Eye protection
- None of the above
Can you spot the flaws? There are quite a few. Firstly, the wording is confusing. Explicitly, the term "PPE" in the question should be spelled out ("Personal Protective Equipment") because readers may not know what PPE means. Similarly, the term "barrier device" may be confusing because many people know it as a "CPR breathing mask."
Context Is Important
The situational information also makes the question confusing. What exactly are we asking here? Should Maria start CPR without personal protective equipment? Does Maria have all the equipment she needs? Or are we asking which personal protective equipment she should use when performing CPR? This last question is what we are asking, so this situational information adds confusion.
Also, research has shown that "None of the above" is a problematic answer choice that should not be used. Additionally, the question is worded negatively (Which of the following PPE is not needed when doing CPR?), and research finds negatively worded questions perform poorly.
Evidently, this is an awful lot of flaws in one short MCQ, and they all damage the question!
Here's a less confusing version of this question:
When performing CPR, which of the following personal protective equipment (PPE) should you optimally use? (Select the best answer.)
- A barrier mask for giving rescue breaths
- An automated external defibrillator (AED)
- Hand washing after contact with the victim
Obviously, this is much clearer.
Why Bother With MCQ Design?
Does it really matter if you write MCQs that contain these and other flaws? It sure does! If you are going to use multiple-choice questions, you must design and communicate them clearly and concisely. Poorly written MCQs cause problems.
Many (60%+, according to multiple research sources) MCQs have flaws. Flawed MCQs can compromise assessment data, confuse participants, and cause morale, bias, legal, and other problems for people and organizations. Bad MCQs provide bad data or, at best, useless data. They don’t provide the data you need to improve instruction.
Valid And Reliable Assessments
A valid assessment measures what it claims to measure. We want assessments to assess learning outcomes, but they cannot perform as needed unless they are valid. Unclear assessments, for example, damage validity because unclear assessments cannot measure learning outcomes. When people have difficulty understanding what is being asked or questions that require a great deal of effort to figure out, assessments don’t work well. For you, the participants, or your organization.
Here are 4 commonly accepted ways to build more valid course assessments:
- Align assessment items to learning objectives
- Have several proficient practitioners weigh in on whether assessment items measure important aspects of performance
- Create a larger percentage of questions from more critical learning objectives
- Make items as difficult (or easy) as the real task
I’ll talk about the first point next. When assessing venipuncture skills (using a syringe and needle to draw blood for blood tests), assessments need to measure if the task was completed correctly and with the right outcomes. You may also have heard that assessments must be reliable. Reliability means the assessment works consistently. Reliability is important because assessments that don’t measure consistently cannot be valid. One of the most critical reliability issues is assessments that are unclear, imprecise, or ambiguous.
Align Assessments With Learning Objectives (LOs)
As I showed in the last section on validity and reliability, aligning carefully designed multiple-choice questions with well-written learning objectives is one of the most important tactics for more valid assessments. Here are 3 steps that can help you craft well-written, job-focused learning objectives:
- First, you need to find out what business needs your corporate training is meant to impact.
- Then, you should analyze what participants need to be able to do to satisfy these needs.
- Lastly, you should translate what participants need to be able to do into specific, measurable, and observable statements (LOs).
Below is an example of the 3 steps to use when helping people improve a merchandise-stocking task.
Step | Example |
1. Find out what business needs training is meant to meet. | Reduce damaged new merchandise put into stock by 75%. |
2. Analyze what participants need to be able to do to meet these needs. | Sales staff:
Manager:
|
3. Translate what participants need to be able to do into specific, measurable, and observable statements (LOs). | Sales staff:
Manager:
|
This process leads to real-life and job-task objectives rather than low-level objectives.
Lower-Level Objective
Define damaged merchandise.
Lower-Level Question
Damaged merchandise when referring to incoming items is merchandise that:
- They are unlikely to sell in the current or following season.
- They are missing one or more parts or pieces.
- Are unsaleable in their current condition.
Higher-Level Objective
When checking in new merchandise, find and prepare damaged items for return.
Higher-Level Question
[Image of a bowl with a 1-inch chip in the side]
You are checking in new merchandise and find a glass bowl in the condition shown. What should you do?
- Prepare it for return to the warehouse.
- Price it for placement in the sale area.
- Ask the manager if the bowl can go in the sale area.
The higher-level question above doesn’t simply ask for a definition. Rather, it assesses what people can do with that definition. If they don’t know the definition, they will have a much more difficult time answering. The higher-level question better aligns with the real-life and job-task-specific learning objective.
When writing learning objectives for workplace instruction, most skilled practitioners know not to use "appreciate" or "understand" as the action verb in their learning objectives. These verbs aren’t measurable. But what about using "list" or "describe"? These verbs are measurable, but we shouldn’t use them and here’s why.
Most real-life and job tasks don’t require people to list or describe. Instead, they require people to perform. After all, real-life and job-task learning objectives are written as rules. For example, when checking in new merchandise, find and prepare damaged items for return is written as a task rule. By writing them this way, you prime the pump for writing MCQs that measure the ability to make decisions and solve problems, not just recall content.
Become An Expert In Multiple-Choice Questions Design And More
If you need to know how to write valid, job-focused multiple-choice questions, consider taking my Write Learning Assessments course. I deliver this group-paced, blended online course to a limited number of participants 3 or 4 times a year. The next course starts on January 20, 2020, and there’s a special discount code for eLearning Industry members. Go here and use the coupon code 100OFF to get $100 off the course. But hurry! We greatly limit enrollment for better interaction and feedback, and when the course is full, we won’t accept more participants until the next time we deliver the course.
Selected References:
- Abedi, J. (2006). Language issues in item development. In S. M. Downing and T. M. Haladyna (Eds.), Handbook of Test Development. Routledge.
- Downing, S. M. (2005). The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Advances in Health Sciences Education, 10(2), 133-143.
- Haladyna, T.M. & Downing, S.M. (1989). A taxonomy of multiple-choice item writing rules. Applied Measurement in Education, 2(1), 37–50.
- Haladyna, T.M. & Downing, S.M. (1989). Validity of a taxonomy of multiple-choice item writing rules. Applied Measurement in Education, 2(1), 51–78.
- Marsh, E. J. & Cantor, A. D. (2014). Learning from the test: Dos and don’ts for using multiple-choice tests. In McDaniel, M. A., Frey, R. F., Fitzpatrick, S. M., & Roediger, H. L. (Eds.), Integrating Cognitive Science with Innovative Teaching in STEM Disciplines. Washington University, Saint Louis, Missouri.
- Nedeau-Cayo, R., Laughlin, D., Rus, L., Hall, J. (2013). Assessment of item-writing flaws in multiple-choice questions. Journal for Nurses in Professional Development, 29(2), 52-57.
- Shrock, S. A. & Coscarelli, W. C. C. (1989). Criterion-referenced test development. Reading, MA: Addison-Wesley.
- Tarrant, M., Knierim, A., Hayes, S. K., & Ware, J. (2006). The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nursing Education in Practice, 6(6), 354-363.