A lot of AI research in education currently suffers from the same problem: small samples, short interventions, weak control groups, and, surprisingly, large conclusions about how “AI is transforming education.” The result is that for researchers, and honestly for me as well, reading the literature often feels like carefully weeding through a huge amount of lower-quality work. That is exactly why this new study on ChatGPT feedback in Teaching and Teacher Education by Ding and colleagues caught my attention. Not because it is perfect. It certainly is not. But this is actually one of the better AI-in-education studies I have read in recent months.
The researchers investigated whether feedback generated by ChatGPT could help pre-service teachers improve their lesson design. More specifically, participants created a lesson plan and then received feedback. One group received feedback from ChatGPT-4, while the other received feedback from experienced teachers. Afterwards, both groups revised their work.
What makes the study particularly interesting is that the researchers did not only look at the final product. They also tried to distinguish between two things that are constantly mixed together in discussions about AI: performing better on a task and developing broader professional competencies.
And that is where the study becomes genuinely interesting. First, the simple part: both groups improved significantly after receiving feedback. Both teacher feedback and ChatGPT feedback led to better lesson plans. There was no significant difference between the two groups. In other words, AI-generated feedback worked about as well as teacher feedback for this specific task. That is actually good news.
But that does not automatically mean that the participants became better instructional designers. When the researchers looked at broader instructional design competencies, they found no significant changes. The pre-service teachers produced better products, but they did not necessarily develop deeper or more transferable expertise. It is important to note that this was also true for the group receiving feedback from actual teachers, although those results came somewhat closer to statistical significance.
That may sound like a minor detail, but it is probably one of the most important discussions surrounding AI in education.
Because what exactly do we mean when we say AI “works”? Does someone produce better output while using it? Or that they later understand more, reason better, or develop expertise that remains once the AI disappears?
This study strongly suggests that these are not the same thing. Things became even more interesting when the researchers examined how students used the feedback. Participants who received ChatGPT feedback were less likely to implement suggestions literally. Instead, they adapted the feedback more often to fit their own context. The authors describe this as “adaptive implementation.”
You could interpret that positively: AI may stimulate reflection and active processing. But there is another possible interpretation. The interviews suggest that ChatGPT feedback often remained fairly generic. It was useful for structure, completeness, and adherence to standard pedagogical principles, but less strong in its contextual nuance. Teacher feedback, on the other hand, more often addressed classroom management, feasibility, or transitions within a lesson.
So perhaps students simply had to do more translation work themselves to make the AI feedback genuinely useful. And honestly, that sounds quite plausible.
Methodologically, this study is also stronger than many AI papers currently circulating online. It includes a control group, a pre-post design, qualitative and quantitative analysis, and the authors remain remarkably cautious in their conclusions. At the same time, the limitations remain substantial: only 42 participants, a six-week intervention, and a very specific context within Chinese teacher education. As I mentioned earlier, certainly not perfect.
It is exactly this caution that makes the paper more interesting than many spectacular AI narratives. Instead of once again claiming that AI will replace teachers or fundamentally transform education, the study presents a far more realistic view. Generative AI can probably function quite well as a scalable feedback tool, especially for concrete tasks and early revisions. But performing a task better is not the same as developing deeper professional expertise.
And perhaps that is where the most interesting direction currently lies: not human or machine, but human and machine together. Although that last part is my own interpretation of the findings.