The buzz around ChatGPT in education is real. You can’t scroll through LinkedIn or attend a conference without someone pitching yet another groundbreaking use case. Teachers experiment, students tinker, and researchers are racing to publish. I received the comment earlier this month that I was too sceptic by raising mere questions. Somewhere in that rush, we may skip the one thing we’re supposed to be good at: asking the right questions.
A recent commentary by Weidlich, Gašević, Drachsler, and Kirschner is a much-needed pause button. Their message is both simple and brutal: Much research on ChatGPT in education isn’t just premature—it’s confused at the core.
Why? We’re treating ChatGPT as if it is an educational intervention when, really, it’s just a tool. It’s like running a study to see whether “using a truck” improves nutrition without asking what’s inside it. In education, we confuse the medium (ChatGPT) with the method (how it’s used to teach or learn something). That’s not just a technicality—it makes entire studies hard or impossible to interpret.
Take a recent meta-analysis that claims ChatGPT improves academic performance, with a striking effect size of 0.7. That’s higher than decades of research on intelligent tutoring systems—AI tools designed to support learning. Meanwhile, ChatGPT wasn’t designed for education at all. So, how is it supposedly outperforming everything else? Spoiler: it probably isn’t. The studies included in the meta-analysis vary wildly in how ChatGPT was used, what students were supposed to learn, and what counts as a “learning outcome.” One study measured how good students were at using ChatGPT during a task—not what they learned from it. Another relied on students rating their own problem-solving skills after using the tool. That’s not learning, that’s self-perception.
This all sounds like nitpicking, but it’s not. If we want to know whether ChatGPT helps students learn, we have to be precise about what exactly the students do with it, what the comparison group does instead, and how we measure learning. Right now, many studies haven’t got past the first step. They just add ChatGPT to the mix and hope something good happens.
That doesn’t mean we shouldn’t explore. But it does mean we need to stop pretending that a flashy new tool gives us permission to skip basic research logic. Intelligent Tutoring Systems didn’t get good results because they were “AI” but because they were designed with clear educational goals based on well-understood principles like feedback, adaptivity, and cognitive modelling. ChatGPT isn’t that. It’s a competent general-purpose language model, which means it can be anything—or nothing—depending on how it’s used.
There’s a deeper lesson here about “fast science.” We’re so eager to study new technologies that we risk producing lots of weak, contradictory evidence. In the long run, that doesn’t just slow down progress—it confuses policymakers, frustrates educators, and gives sceptics an easy reason to dismiss the whole thing. To maximise the potential of AI in education, we must take a deliberate approach and do the necessary groundwork: define our questions, select meaningful comparisons, and measure what truly matters.
We don’t need to reinvent the wheel whenever a new technology comes along. But we do need to remember what the wheel is for.
[…] there’s education, the part I know a bit more about. Greece, along with countries like Estonia, is poised to be one of the first to implement ChatGPT […]
[…] few months ago, I wrote that we might be asking the wrong questions about ChatGPT in education, inspired by an article by Weidlich, Gašević, Drachsler, and my good friend and colleague, Paul Kir…. We tend to wonder what it can do, not what it does to learning. And just a few days ago, I asked […]
[…] diving into research on AI in education today will find plenty of promise. They will also find quite a few studies that do not fully deliver. There are intelligent systems that provide feedback, robots that support collaboration, and […]
[…] What if we’re asking the wrong question about ChatGPT in education? […]
[…] We’ve seen a similar argument before: that much of the debate around ChatGPT in education is confused at its core, because we treat the tool as if it were the intervention itself (What if we’re asking the wrong question about ChatGPT in education?) […]