When adaptive learning forgets about forgetting

There’s a fascinating – and sobering – new study in the International Journal of Artificial Intelligence in Education by Brendan Schuetze, Veronica Yan, and Paulo Carvalho (2025). It examines the types of computational models that underpin intelligent tutoring systems and adaptive platforms. Such types claim to “know” when a student has mastered a concept and when it’s time to move on.

The researchers tested some of the most widely used models in the field: Bayesian Knowledge Tracing (BKT), a variant of BKT with a forgetting parameter, and the Additive Factors Model (AFM). These models sit at the heart of many systems that are sold to schools as the brains behind adaptive learning. In theory, they’re supposed to track what a student knows, predict what they will remember, and guide the next practice step.

The test was elegant. Instead of relying on messy, real-world platform data (where content isn’t always revisited), they used a laboratory dataset from Rawson et al. (2018). Students learned Lithuanian–English word pairs across six sessions, each a week apart, with a final test after three weeks. This design is perfect for capturing both learning and forgetting – including the classic spacing effect. In total, nearly 50,000 practice trials from 88 students were analysed.

The results? When the models were fitted retrospectively to all the data, they looked reasonable. They seemed to capture overall learning trends, and on the surface, their accuracy metrics looked fine. But when the models were asked to do what really matters – predict future performance – things fell apart.

  • They failed to capture spacing effects. Students who studied with wider spacing performed better later, but the models either failed to capture this or even predicted the opposite.

  • They ignored forgetting. The models consistently overestimated how much students would remember after a delay.

  • Adding a forgetting parameter didn’t help much. In practice, these models often did no better than having no model at all.

The conclusion is clear: the most popular student models inside adaptive platforms are not cognitively plausible. They can fit past data, but they cannot predict real future learning, especially when time, forgetting, and spacing matter. In other words, they confuse short-term performance with durable learning.

And that has real consequences for schools.

  1. Don’t trust the dashboard blindly. If a platform tells you a student has “mastered” something because they got it right three times in a row, that says little about whether they will remember it next week.

  2. Mastery is not stable. Sustainable learning requires time and revisiting. A student who looks successful today may forget tomorrow if the system no longer revisits that content.

  3. Spacing and retrieval matter more than the algorithm admits. Good teaching practices – such as spreading practice over time, revisiting key material, and having students retrieve information – remain more effective than what current models assume.

  4. Teachers need to stay in the loop. Adaptive software can be useful for practice, but it won’t solve the fundamental challenge of memory and forgetting. Teachers still need to make decisions about review, retrieval, and long-term retention.

The irony is that we already know from decades of cognitive psychology how learning and forgetting work. Yet the models inside adaptive technology often ignore these principles. Until that gap is closed, schools should be cautious: adaptive platforms may offer efficiency, but they don’t replace the basic need for well-timed practice and review.

As the authors put it, sometimes having a poor model of human learning works about as well as having no model at all.

Abstract of the study:

The development of intelligent tutoring systems and other educational technology necessitated the implementation and development of computational models of student learning. At the foundation of these models is the assumption that they accurately implement and track human cognitive processes. However, the extent to which this assumption is correct requires testing against empirical data. In the current paper, we use data from a large-scale longitudinal lab study to investigate the match between the processes instantiated in the models and human memory and learning processes. When fit to all sessions retrospectively, the selected models of student learning (Bayesian Knowledge Tracing, Bayesian Knowledge Tracing + Forgetting, and Additive Factors Model) capture the qualitative trends of learning across sessions and relatively acceptable fit metrics. However, when the models are used to predict future behavior (time-based cross-validation), as is often the goal in applied contexts, the picture changes considerably. We show that these popular types of student learning models fail to account for basic cognitive principles—the spacing effect, and patterns of forgetting and learning across sessions. In fact, in some instances, having a poor model of human learning and memory may perform as well as having no model of human learning and memory at all.

Leave a Reply