Another AI article retracted from a top journal. But this time, the story is different.

No new research in this post, but rather another retraction involving AI. We already had that large meta-analysis that did not meet quality standards, but this time the story is different. The article in question appeared in Teaching and Teacher Education, one of the most prestigious journals in educational science. The title alone reveals that this was not a traditional impact study: Whom do we educate? Uncertainties and inexplicable ecstasy of the GenAI era in foreign language teacher education.

The reason for the retraction is striking:

Following publication, concerns were raised by the Corresponding Author regarding inaccuracies in several citations and references. An investigation by the journal identified multiple errors in the reference list, including incorrect bibliographic details and references that could not be reliably verified.

In other words, the corresponding author sounded the alarm themselves. The problem was not the data, the statistical analyses, or conclusions drawn from a flawed experiment. The problem was the references.

Now, mistakes in a reference list are not unusual. Incorrect page numbers, a wrong publication year, or a missing author would normally lead to a correction. Articles are not usually retracted for that.

What makes this case different is the statement that some references “could not be reliably verified”. That goes beyond a simple typo. It raises the possibility that some sources may not have existed, may have been misrepresented, or simply could not be traced. In other words: Frankencitations.

While this phenomenon predates ChatGPT, it is hard not to think of AI. Not because the retraction notice mentions the use of AI. It does not. We therefore do not know what caused these problems. But anyone who has worked with generative AI over the past two years will be familiar with hallucinated references: a chatbot generates a perfectly plausible article title, complete with authors, journal, and publication year, only for the article not to exist at all.

The problem itself is not new. What is new is the scale at which such errors can occur. Where an author might once have inserted a single incorrect reference, a language model can now generate an entire bibliography in seconds that appears impressive at first glance.

That brings us to an uncomfortable question: how carefully do we actually check references? Peer reviewers rarely verify every source individually. There simply is not enough time. Editors generally do not do so either. The system relies heavily on trust: trust that authors cite existing work accurately. For many years, that trust has functioned reasonably well. Generative AI, however, exposes a weakness that has always been there.

Reference checking may become one of the most underestimated skills of the AI era. Not only for researchers, editors, and reviewers, but also for students submitting their theses in the coming weeks. A source that looks convincing is still not necessarily a source that exists.

Leave a Reply