In my work, I try to stay sober about AI. Sometimes that comes across as too critical for enthusiasts, sometimes too positive for sceptics. The latter group might feel more at home today, thanks to a new study by Sean P. Walton and colleagues that examines what actually happens when people work with AI. Human AI collaboration is at the heart of this topic, not in theory, but in practice. And the result is both hopeful and somewhat sobering.
Let me start with what clearly emerges. People who actively collaborate with AI to design a virtual car perform better than those who simply let the system run. In this study, participants who did nothing improved by an average of 124%. Those who started tweaking designs themselves reached around 243%. But those who also used AI-generated suggestions went up to roughly 420%.
And it does not stop at output. People who engage with those AI suggestions stay involved longer and take more actions. In other words, they are more engaged. Not only behaviorally, but also cognitively. They look, choose, test, and adjust. That might be even more important than the final score itself. Learning is rarely a passive activity.
It is also interesting that not all AI is equally useful. The study compares “smart” AI-generated suggestions with random examples. Both receive similar user attention, but the smarter suggestions lead to more actions and better outcomes. Not necessarily because people perceive them as better, but because they seem to offer more to work with. The underlying algorithms do make a difference.
And that brings a small surprise. What works objectively does not always feel that way to users. Participants are divided in their judgments. Some clearly prefer the AI suggestions, others prefer random ones, and some see little difference at all. That is familiar. In education, we see the same pattern: what is effective is not always experienced as such.
Before jumping to conclusions, though, some nuance is needed. This is not a classroom study. There are no students preparing for an exam, no teacher making decisions under time pressure. It is a design task in a relatively controlled environment. Interesting, but not directly transferable to education.
The participants are not a typical classroom either. They choose to take part. They have time, interest, and often some technical affinity. And even then, half of them do… nothing. They simply let the system run. That might be the most telling finding of all. Technology can be powerful, but if people do not engage with it, nothing happens.
There are also methodological caveats. Most of the data comes from an open field study. That gives scale, but less control. We do not know exactly who is doing what, or how consistent participants are. The additional lab study adds control, but with only twelve participants, it remains exploratory.
Then there is the question of what “better” actually means here. Performance is measured as improvement relative to a starting point. That makes interpretation tricky. Those who start poorly can improve more easily, while those who start strong have less room to grow. The authors acknowledge this, but it is worth keeping in mind.
So what can we take from this study? Not that AI works. But when it can work. Not as a replacement, not as an automatic solution, but as part of an interaction. People who actively engage with AI get something out of it. Those who do not, do not.
That may sound almost trivial. But in the debate about AI in education, this is exactly what often gets lost. We talk about tools, systems, and possibilities. Much less about what people actually do with them.
This study brings that back into focus. AI can help. Sometimes quite a lot. But only if someone actually works with it. And even then, the outcome depends on how that interaction unfolds.
And did they learn? Could they do it better next week or month without AI? If not, what is this research worth? To quote Eric Burdon and War: Absolutely nothing.