I remain cautious about AI in education and, for now, also about the research around it. Not because I am against AI, but for a different reason. A recent review by Edison Marino Cerón Salazar and Diana Carolina Burbano González helps to explain why.
At first glance, the conclusion looks impressive. As many as 89% of the included studies report positive effects. Case closed? AI works.
Not quite.
To start with the good news: this is not a weak review. Not at all! The authors follow PRISMA guidelines, use multiple reviewers and bring together 72 studies from 2014 to 2024. In a fragmented field like AI in education, that is valuable. Also, those positive findings are not meaningless. Many of these systems provide feedback, structure tasks, offer extra practice or break content into manageable steps. We already know those things can support learning.
But that is also where the problem begins: What is actually being measured?
That 89% says less about how well AI works and more about what research in this field looks like. To begin with, “positive effect” is a broad category. It can refer to clear learning gains, a small increase in motivation, or even qualitative impressions of improvement. All of that is grouped together. That is defensible in a review. This is not a meta-analysis. But it also makes the headline far less precise than it appears.
In many cases, researchers compare AI to doing very little or nothing at all. Students using an AI tool outperform those without additional support. That should not surprise us. Almost any structured intervention would produce gains under those conditions. The real question should not be whether AI outperforms nothing, but whether it outperforms good teaching.
That question is rarely asked.
On top of that, many studies are small-scale. Thirty students here, fifty there, often within a single institution. Replication is limited. And in education, context matters. What works in one setting does not automatically work in another.
There is also the issue of independence. A substantial share of studies is conducted by the developers of the tools themselves. That does not invalidate the results, but it does increase the likelihood of optimistic findings.
We know this pattern: fields characterised by small samples, analytical flexibility, and strong incentives to produce positive results tend to overestimate effects. AI research in education fits that profile.
Another issue is that we often do not know what actually drives the effect. Is it the AI itself? Or simply more time on task? Is it the feedback generated by the system, or just the presence of feedback?
Many studies describe the technology in detail but say little about the underlying pedagogy. And that matters.
We already know a great deal about how people learn: about working memory, practice, feedback, examples and the development of expertise. Yet much of the AI research barely connects to that knowledge base.
So what should we make of those 89% positive findings? We should not dismiss them. But we should not overinterpret them either. They show that AI can support learning under certain conditions. They do not show that it is better than existing approaches. And, sadly, they do not clarify when it works best. Also, they say little about long-term effects.
In short, they are a starting point, not a conclusion.
If anything, this review confirms a familiar lesson from educational research: positive effects are easy to find. Understanding them properly is much harder. That requires larger studies, independent evaluations, replication and, above all, a stronger link to what we already know about learning.
Until then, “89% positive” tells us less than it seems.