AI lesson planning: this study shows a problem, but not the right one

Do AI tools create better lesson plans than teachers? Many educators would instinctively answer ‘no’. Some will hope ‘not yet’.  But I think: is that actually the right question to ask?

A recent study by Trust and colleagues seems to confirm the simple negative answer. The researchers asked ChatGPT, Gemini, and Copilot to generate more than 300 lesson plans for civic education, then analysed what those plans actually contained.

The results sound convincing. Most activities focused on remembering, understanding and applying. Higher-order thinking was rare. And anyone hoping for diversity or multiple perspectives would be disappointed: in 94% of the activities, these were entirely absent. That last finding is hardly surprising. We already know that AI tends to reproduce the biases present in the data it is trained on. So the conclusion seems straightforward: do not trust these lesson plans.

That sounds reasonable. And to some extent, it is. But is this actually the right study to support that conclusion?

What the researchers really examine is what happens when you give AI a single, simple prompt and then take the output as it is. No iteration, no refinement, no follow-up questions. Just: “write a lesson plan” and done. Methodologically, that is clean. Didactically, it is somewhat odd. Because that is not how teachers work. Or at least, not how we would hope they work.

The lesson plans generated in this way are indeed predictable. They follow a fixed structure, stay on the safe side, and aim for the average. But that is exactly what you would expect from a system trained on large amounts of existing material and not pushed any further. AI is not a creative rebel. It is, in many ways, rather a mirror of what already exists. And seen from that perspective, this study mainly shows us an uncomfortable mirror.

That is what makes the study both interesting and limited. Interesting, because it clearly shows what happens when AI is used without professional intervention. Limited, because it does not really address the question we actually care about: what happens when teachers work with AI?

That is a crucial difference. Not between good and bad technology, but between passive and active use.

Other studies, such as recent trials by the Education Endowment Foundation, focus precisely on that. They do not look at what AI produces on its own, but at what teachers do with it in their preparation. And then a different picture emerges. No miracles, no dramatic transformations, but small, meaningful improvements: time saved, more variation in activities, sometimes better alignment with students’ needs. Not because AI suddenly becomes brilliant, but because teachers engage with it.

And that is the core of the issue. AI is not didactics, let alone pedagogy. It can be a tool. If you use it as a substitute for thinking, you will get average results. If you use it as a support for thinking, you may get something better.

The study by Trust and colleagues does reveal a real problem. But not quite the one it claims to show. The problem is not that AI produces poor lesson plans. It becomes a problem when we start treating those plans as if they were finished. That is not a technological issue. It is a professional one.

If you use AI the way you use a search engine, you will get what you expect. If you use AI the way you use a critical colleague, you may get something else. But that requires something from the user: knowledge, judgement, and above all, a willingness not to settle for the first answer. In that sense, AI may tell us less about what technology can do, and more about what professional expertise still needs to do.

Leave a Reply