Anyone following recent discussions about AI in education will recognise a familiar pattern, the same one I highlighted last week in my blog on a meta review about this topic. Much AI research focuses on technological possibilities and pays far less attention to the underlying didactics. We experiment enthusiastically, but often without a clear view of what students actually do, how much thinking they still do themselves, and which learning theories should guide the design. In that light, studying taking notes vs ChatGPT learning, the new study by Pia Kreijkes and colleagues is worth a closer look, not because it celebrates what AI can generate, but because it examines something many studies ignore. What do students actually remember a few days later when they have learned with a large language model?
The study’s setup is relatively simple and, for that reason, quite strong. More than three hundred fifteen-year-olds in the United Kingdom read two difficult history passages. For one of the texts, every student was required to use a chatbot to request explanations, clarify concepts or generate a summary. The other text was processed, depending on the assigned condition, either with notes only or with a combination of notes and the LLM. This allowed the researchers to compare how much students learned from taking notes, from AI support or from a combination. Three days later, the students completed tests on factual knowledge, comprehension and free recall. This was not a demonstration of what AI can produce. It was an experiment that asked what students still know once the screens have been closed for a while.
The core findings will not surprise most experienced teachers. Students learn more when they take their own notes. Almost all outcomes show that active processing is more powerful than what students spontaneously do with an LLM. The combination of LLM and notes produced a small extra gain compared with the LLM alone, but this was much smaller than the advantage of simply taking notes. This is exactly why the study matters. Students found working with the LLM more enjoyable and easier, yet they put in less effort and less time. It reads almost like an example taken from a textbook on learning psychology, but here the pattern shows up in the data. Activities that feel easy are not necessarily the ones that stick.
The study also earns credit for the care that went into its design. The researchers matched the texts in advance for length, difficulty and conceptual density. They preregistered every step. Three independent raters scored the open responses and reached almost perfect agreement. The team not only looked at learning outcomes but also examined closely how students interacted with the AI. One striking observation is that many students in the combined condition simply copied and pasted. Their notes sometimes reproduced literal fragments of LLM output. In that process, they lost what makes note-taking effective. It requires selecting and rephrasing. In other words, it requires thinking.
The limitations appear just as clearly. The study includes no passive-reading condition, so we still do not know whether LLM use beats or lags simply reading without a strategy. The experiment remains narrow. It focuses on one age group, one subject, two short sessions and one model that is already outdated. It tells us nothing about how students might use AI across longer learning cycles, with guidance or in other domains. And although the researchers designed the study with care, they still compared a well-established learning strategy with a fairly unstructured approach to AI. That contrast reveals a lot about spontaneous use, but far less about how AI could support strong didactics. It is also relevant that the researchers work for Microsoft and Cambridge University Press and Assessment.
Perhaps the most striking finding is the tension at the heart of the results. Students learned most from notes without LLM support, yet they enjoyed that approach the least. That is not a small detail. It once again shows how wide the gap can be between what feels comfortable and what actually leads to learning. Technology can help narrow that gap, but it does not remove it.