There is something irresistible about solutions that feel both old and new at the same time. Oral exams, for example. Centuries old, once the norm, then largely abandoned, and now suddenly back in the spotlight. Not out of nostalgia, but out of necessity. Since the rise of generative AI, a familiar reflex keeps popping up: if we can no longer reliably assess writing, maybe we should return to speaking.
It sounds logical. And to some extent, it is.
In a recent article in Educational Researcher, Andrea Fenton makes a compelling case for taking oral assessments more seriously again. She argues that they not only help address issues of academic integrity in the age of AI, but also do something else: they force students to think, to reason, to explain. Not just reproduce, but demonstrate understanding.
And that is no small thing. At a time when AI can generate texts that look convincing at first glance, it is becoming increasingly difficult to see what students can actually do themselves. Oral exams seem like an elegant answer. You can probe. You can adapt. And hopefully, you can see thinking unfold. Or not.
But, as so often in education, there is an important nuance.
Oral exams are not new. And they did not disappear by accident. There were good reasons. Time, scalability, reliability. It is not easy to assess large groups of students individually. And it is even harder to do so consistently and fairly.
Fenton acknowledges these challenges as well. Oral assessments require clear structures, trained assessors, and careful attention to bias. Without that, you quickly run into issues of subjectivity and inequality.
This is where a recent response to her article becomes particularly interesting. Hu and Hashim agree with the potential of oral assessments, but add three important refinements. Not as a kind of rejection, but a reality check.
First, fairness is not guaranteed. In an oral exam, you are never measuring knowledge alone. You are also measuring language proficiency, confidence, and thinking speed. In other words, you are measuring more than you might intend. Their suggestion is simple but crucial: make that explicit in your rubrics and separate content from performance.
Second, there is the issue of reliability. Different examiners may judge the same student differently. Small variations in questioning or prompting can have a large impact. Their solution is not glamorous but necessary: standardisation, recording, and systematic checks of interrater agreement.
And then there is perhaps the biggest challenge: scalability.
Here, they propose something interesting: a “viva-lite” approach. Instead of fully replacing written exams, add short oral verifications. Five minutes can be enough. Ask students to explain one step of their reasoning. To justify a choice in a new context.
This is not a revolution. It is a pragmatic adjustment. But perhaps the most important twist is also the most ironic one.
Oral exams only solve the AI problem if they require something that cannot simply be memorised. Students can easily rehearse AI-generated answers. What breaks that is not speaking itself, but thinking. Unseen questions. Transfer. Thinking aloud. Which brings us back to something we already knew.
It is tempting to believe that changing the assessment format will solve the current AI challenge. But, as so often, the real issue is not how we assess, but what we are actually trying to measure.