How AI can help unmask fake scientific journals

In discussions about AI – including those on this blog – the focus is often on what can go wrong: hallucinations, bias, and poor applications in the classroom or healthcare. However, now and then, a study emerges that demonstrates how AI can also be leveraged wisely to safeguard the integrity of science itself. A good example appeared recently in Science Advances (Zhuang, Liang & Acuña, 2025).

The problem with questionable journals

Their question: can AI help detect so-called predatory journals – or, more accurately, questionable open-access journals? You may not be familiar with the problem. Still, since the breakthrough of open access, which I personally support, more publishers have emerged that charge researchers to publish. They do this without providing real peer review or proper editorial work. They often use misleading names that resemble established journals and promise publication within a week. Ultimately, they release papers that have been hardly reviewed at all. For authors under pressure to publish or those with limited experience, such journals may seem like a quick fix. But in reality, they undermine scientific integrity.

Fortunately, some resources help separate the wheat from the chaff: the Directory of Open Access Journals (DOAJ), with its strict admission criteria; the historical but now discontinued Beall’s List; and commercial players such as Cabell’s. The trouble is that this work is often slow and labour-intensive. New predatory titles spring up like mushrooms, often with only slightly different names and websites.

How AI can help

This is precisely where AI can be of assistance. The researchers built a model that does not rely on a single feature but combines several:

  • Website content (is there a clear peer review policy, copyright or ethics statement, and who is on the editorial board?);

  • Website design (patterns in source code or even the look of the homepage, often recycled by dubious publishers);

  • Bibliometric data (levels of self-citation, the h-index of authors, the diversity of citations and institutional links).

The model achieved impressive accuracy. At a realistic threshold, it flagged over a thousand suspect journals. These journals are responsible for hundreds of thousands of articles and millions of citations. Notably, the AI results were compared with those of human experts, and the overlap was substantial. Of course, there were still errors. About a quarter of the flagged titles turned out to be false alarms. However, the authors stress this is not meant as a final verdict. It is a triage system. It narrows down the pile so that experts can spend their time more effectively.

Lessons for researchers and funders

This is AI at its best: not replacing humans, but creating scale where manual work is too slow and reactive. At the same time, the authors warn against drawing hasty conclusions. Detection should never be fully automated. Subtle cases and contextual knowledge will always require human judgment.

What I particularly like is how the study also clarifies the terminology. Predatory publishing is not a black-and-white story. Some journals disappear from the DOAJ list without being fraudulent. Even hybrid or traditional publishers can display questionable practices. But that there is a real problem – few would deny.

The lesson for us? As researchers, institutions, or funders, we must remain alert about where we publish and who we cite. AI can be a useful tool here, but we should demand that such systems remain transparent, explainable, and fair. Perhaps this is exactly how AI can contribute to science: not by generating noise, but by filtering it out – and by protecting the integrity of knowledge rather than undermining it.

Abstract of the study:

Questionable journals threaten global research integrity, yet manual vetting can be slow and inflexible. Here, we explore the potential of artificial intelligence (AI) to systematically identify such venues by analyzing website design, content, and publication metadata. Evaluated against extensive human-annotated datasets, our method achieves practical accuracy and uncovers previously overlooked indicators of journal legitimacy. By adjusting the decision threshold, our method can prioritize either comprehensive screening or precise, low-noise identification. At a balanced threshold, we flag over 1000 suspect journals, which collectively publish hundreds of thousands of articles, receive millions of citations, acknowledge funding from major agencies, and attract authors from developing countries. Error analysis reveals challenges involving discontinued titles, book series misclassified as journals, and small society outlets with limited online presence, which are issues addressable with improved data quality. Our findings demonstrate AI’s potential for scalable integrity checks, while also highlighting the need to pair automated triage with expert review.

Leave a Reply