A few months ago, I wrote that we might be asking the wrong questions about ChatGPT in education, inspired by an article by Weidlich, Gašević, Drachsler, and my good friend and colleague, Paul Kirschner. We tend to wonder what it can do, not what it does to learning. And just a few days ago, I asked another one: Is the evidence actually good enough?
Now, a new systematic review in Teaching and Teacher Education (Turmuzi, Azmi & Kertiyani, 2025) offers a partial answer. The authors analysed twenty empirical studies on ChatGPT in mathematics education, published between 2023 and 2025. And frankly, the results read like an illustration of both earlier blog posts. And no, I don’t write this to claim that I was right, quite the opposite.
The review indicates that ChatGPT is primarily used to provide quick feedback and generate additional exercises, particularly in algebra and statistics. In about 70 per cent of the studies, that worked well: students received more immediate feedback and felt supported. However, things went wrong once the problems became more complex, such as calculus, problem-solving, or understanding the reasoning behind a calculation.
More striking, however, is what wasn’t studied. In 65 per cent of the papers, the focus was on perceptions rather than on actual learning outcomes. Only a handful used a solid experimental design. Almost all were small-scale studies, usually involving fewer than eighty participants. Statistical power? Barely. Long-term effects? Unknown. The authors politely refer to this as an “emerging stage of the research field,” but in reality, it confirms what I wrote earlier: the hype moves faster than the data.
Interestingly, the review also echoes that earlier question from April. Most studies treat ChatGPT as a tool, not as a learning environment. They measure whether the model produces correct answers, but not what happens cognitively when students use it. Does it deepen understanding? Encourage critical thinking? Or does it short-circuit both because the answer comes too fast? Those questions mostly remain unanswered.
What this review really shows is how much we still don’t know. And that’s fine. It’s actually valuable. Knowing what remains unclear helps us design better research: with larger samples, genuine learning outcomes, and attention to motivation, autonomy, and the thinking process itself.
Until then, the conclusion remains the same: ChatGPT can be useful in classrooms, especially for feedback or practice. However, it doesn’t alter the core principle of effective teaching: learning is a human process that requires time, guidance, and reflection.
Hopefully, the next step won’t be just another study showing that ChatGPT is handy for homework, but research that explains when it works, for whom, and why. Those, I think, are the right questions to ask.