What this Nature study really says about education research

This new study by Olivia Miske and many colleagues, published in Nature, is generating a lot of online discussion. Which makes it a good moment to take a closer look and separate fact from fiction. Let’s start with something that already goes wrong repeatedly in the online debate. People use reproducibility and replicability interchangeably, even though they refer to two very different things.

Reproducibility is essentially the minimum standard of research hygiene. If I take your data and run exactly the same analysis, I should obtain the same result. No interpretation, no debate, just the same numbers. That is what this study investigates. Not whether an effect is “real”, but whether the reported results can be obtained again from the same data.

Replicability is something else. Here, you conduct the study again using new data. A different sample, the same hypothesis. The question is whether you find a similar effect. This is about generalisability and robustness. The replication crisis in fields such as psychology emerged from this. And while it is often framed negatively, it is, in essence, a healthy development.

The distinction matters. A study can be perfectly reproducible and still be wrong. Think of a flawed research design that consistently produces the same incorrect result. Conversely, a study can be difficult to reproduce because of incomplete reporting, even when it still points in the right direction. If you mix up these two concepts, you miss what this study by Miske et al. is actually about.

So what does the study show? About 54% of the papers can be reproduced exactly, and around 73% at least approximately. But this only holds for studies with available data, which already limits the conclusion.

And that is only half the story. You need access to data to test reproducibility, and this is where problems arise. Of the 600 papers examined, only 24% made their data available. In other words, we cannot even test three out of four studies. Not because they are necessarily wrong, but because we simply do not know.

This leads to an important nuance. If you focus only on studies with available data, reproducibility looks reasonably strong. But once you consider the full sample, including everything that cannot be checked, the picture shifts. The authors themselves show that the results change markedly depending on how you calculate them.

There are also clear differences between fields. In economics and political science, reproducibility is noticeably higher than in other disciplines. Education research, in this study, sits toward the lower end in that respect. That is exactly the kind of finding that is quickly used online to question the entire field.

But that conclusion is too simplistic. The differences appear to be largely related to how common it is to share data and code. In economics and political science, stricter norms and policies around data sharing have been in place for longer. And it is precisely in those fields that reproducibility is higher. This suggests that the issue is less about “better science” and more about transparency and verifiability.

Another point often misunderstood online is the sample. Yes, 600 papers sounds impressive, and for this type of research it is. But when you consider the size of the field and the fact that these papers span almost a decade, it becomes quite limited. The effective analysis is smaller still, because it includes only studies with available data. In other words, this is a large study with a relatively narrow gateway. That makes the findings valuable, but it also limits what we can conclude.

Another nuance often gets lost. A failed reproduction does not mean the original result is wrong. It can reflect small differences in data, missing steps in the description, different analytical choices, or practical issues. The reverse is also true. A perfectly reproduced result can still rely on a flawed design, bias, or spurious correlations. Reproducibility provides a baseline for credibility, not a guarantee of truth.

Yet on social media, people often use this study as a stick to beat education research, or social science more broadly. They treat it as proof that “it does not work”. That conclusion is too easy.

What this study mainly shows is something we have known for some time. Transparency is the real issue. The problem is not necessarily that researchers make many mistakes, but that we often cannot verify what they have done.

There is also some good news. In fields where researchers more commonly share data and code, such as economics and political science, reproducibility is clearly higher. This suggests something hopeful. Policies and practices do make a difference.

In a parallel study published in Nature, Andrew Tyner and colleagues examined replicability. They tested whether the results could be replicated in new studies. Here, we see a different picture. In this large SCORE project, replication success hovers around 50%, and this also applies to education research. Not exceptionally good, but not exceptionally bad either. It is what you would expect in social science: effects that are often smaller and strongly dependent on context. Does this mean there is no problem? Certainly not. Not because education research fails unusually often, but because replication still happens relatively infrequently, according to other research.

And then there is a third layer that often remains completely under the radar. A third study by Aczel and colleagues asked different research teams to analyse the same data. What did they find? In most cases, researchers chose different analytical approaches and sometimes arrived at different conclusions. So the issue is not only whether results can be reproduced or replicated, but also how they are produced in the first place. One dataset does not yield a single answer, but a range of possible answers, depending on the choices made.

That makes the discussion less comfortable, but more honest: the problem is not that research does not work, but that it is more complex than the slogans suggest.

Leave a Reply