When Tech Giants Study Their Own Tools: Should We Trust the Results?

This tweet in part inspired this post. Google has just published a paper on “Learn Your Way,” its AI-augmented textbook project. On the surface, the idea is compelling: take static material and make it adaptive, multimodal, and personalised with generative AI. Students in the trial scored better immediately after learning, and even three days later, retention was significantly higher. Case closed? Not quite.

The problem of self-assessment

The first problem is obvious: this is a company evaluating its own product. The same team that designed the tool also designed the study, recruited the students, and wrote the paper. That does not mean the data are fabricated, but it does raise the bar for independent replication. When the evaluation of an innovation comes from the innovator itself, scepticism is not cynicism—it’s basic research hygiene.

A control group that proves too little

A second issue lies in the control group. The comparison is between Learn Your Way and… Adobe Acrobat Reader. In other words, an interactive, adaptive, gamified, multimodal environment is pitted against a static PDF viewer. It is hardly surprising that students found the former more engaging and learned slightly more. This is precisely the territory where the Hawthorne effect thrives: participants do better simply because they are given something new, shiny, and clearly designed to make them feel supported. The study reports that all students in the experimental group utilised the quizzes and content transformations, which already suggests that novelty and interactivity alone may explain a significant portion of the effect.

Novelty, engagement, and the Hawthorne effect

Third, the sample is tiny. Sixty students from the Chicago area, split into two groups of thirty, are working on a single textbook chapter. That is a proof-of-concept, not a robust efficacy trial. Generalising from this to “AI will revolutionise textbooks” is like claiming a drug works after giving it to one classroom for three days. Moreover, the study does not disentangle which components drive the effect. Was it the quizzes, the personalised metaphors, the narrated slides, or simply the fact that students knew Google was testing them?

There is also the problem of ecological validity. Students studied for 20–40 minutes under controlled lab conditions, with incentives to perform. However, real classrooms are messy, teachers mediate the learning process, and a single exposure cannot generate long-term motivation. The authors acknowledge some of these limits, but the framing throughout the paper is far more celebratory than cautious.

Limitations mentioned by the researchers

The researchers do acknowledge some boundaries. They note that the study only tested one chapter with a small sample, that it remains unclear which components of Learn Your Way actually drove the effect, and that results from a lab do not automatically translate to classrooms. These are sensible caveats.

But they never mention the elephant in the room: that comparing a rich, interactive AI platform with a flat PDF is an unfair contest. Nor do they address the possibility of a Hawthorne effect—that students improved simply because they received something new and engaging. And, of course, they do not reflect on the awkward fact that this is Google evaluating Google. Those silences are as telling as the admissions.

In conclusion

So, how promising is this research really? At best, it demonstrates that incorporating interactivity and quizzes into learning materials is more effective than providing students with a static PDF. This is something that decades of educational research have already suggested. At worst, it risks becoming a glossy example of the Hawthorne effect: students temporarily perform better simply because they know they are part of an experiment with an attractive new tool.

The real test will be independent trials across different subjects, with larger and more diverse student populations, and with genuinely comparable control groups (e.g., interactive digital tools not developed by Google). Until then, the headline result—slightly higher test scores after three days—should be read as a marketing teaser, not as solid evidence that generative AI textbooks are the future of education.

One thought on “When Tech Giants Study Their Own Tools: Should We Trust the Results?

Leave a Reply