When you say in education debates that something “works according to research”, you usually get approving nods from one corner and sceptical grumbles from another. Especially when a meta-analysis is involved, after all, that is supposed to be the top of the evidence pyramid. Combine many studies, average their findings, weigh them carefully and report an effect size: it sounds reassuringly solid. Yet a new meta-review of 247 meta-analyses on compulsory education interventions shows that the reality does not always match the ideal.
The meta-review by Marta Pellegrini and colleagues shows careful, rigorous and impressively thorough work. That deserves to be said upfront. Yet the findings confront us with several uncomfortable truths. Many reviews limit their transparency. Researchers struggle to adopt methodological innovations. Interpretations of results sometimes lack the depth we need. None of this undermines science. In fact, research becomes stronger when we choose to look directly at these awkward parts.
One finding struck me most: only 4% of the meta-analyses preregistered a protocol. Today we expect researchers to do this to avoid quietly adjusting their approach once results start to emerge, and we teach the same principle to our students in Utrecht. Only 6% shared their full dataset or statistical code. Practical obstacles can make this hard, but the number remains remarkably low in a field where privacy rarely blocks data sharing. Researchers also tend to describe their search strategy only partly. Nine per cent reported full search strings; the rest offered examples. That level of detail makes a review hard to reproduce. Yet policy documents and school improvement programmes rely on this type of synthesis. It feels a bit like following a recipe that says “Add some spices”. You simply do not know which.
Another observation: effect sizes are often statistically dependent on one another. Think of multiple outcomes from the same study. Only 30% of meta-analyses used modern models that appropriately handle this dependency. In many cases, effect sizes are averaged or treated as if they were independent. It sounds harmless, but it can distort the precision of the findings. And honestly, when reading meta-analyses, one rarely sees this issue flagged explicitly. Yet it is exactly the kind of nuance we need to understand results properly.
The question of why studies find different results also receives inconsistent attention. Many meta-analyses report a few standard statistics, but only a third make clear how much studies truly differ from one another, and only 8% provide a prediction interval, which is one of the most practically relevant pieces of information. You can easily have an attractive “average effect” while the spread across studies tells you that something works under some conditions and not under others. That is not a failure of science. It is the reality of education. It is also why the Leerpunt and EEF toolkits now include clear “side notes” to show what the variation looks like.
Then there is the hunt for moderators. Why do certain interventions work better for some pupils or in some contexts? Almost all meta-analyses attempt to examine moderators, but only a quarter use multiple meta-regressions. This approach allows several factors to be considered simultaneously. The rest look at one factor at a time. It feels intuitive, but anyone who has ever worked in a school knows that contexts are never made of a single variable. The main reasons are usually simple: too few studies, too much missing data or incomplete reporting in the original studies included in the review. This is not evidence that meta-analyses are “worthless”.
A striking point is that this pattern does not only apply to small reviews. Meta-analyses with fifty or more studies also often avoid multiple meta-regressions. Size does not automatically mean strength. It mainly shows how vulnerable even well-intentioned syntheses are when the underlying research is thin or inconsistently reported.
It would be easy to wave these findings around as proof that we should not trust meta-analyses. But that would send exactly the wrong message. You would not throw away a thermometer because someone misread it. And we should not dismiss science because some aspects lag behind methodological developments. What this meta-review really shows is that education research is moving forward, but not always evenly. And that some of our strongest tools become even stronger when we are honest about their limitations. That is also the reason I am writing about this review.
For schools and policymakers, the message is straightforward: use meta-analyses, but use them wisely. Do not stop at the conclusion; examine how researchers arrived there. Did they search the literature thoroughly? Did they address the dependency between effect sizes? How much did the results vary? And how transparent was the entire process? If you ask these questions, you extract far more value from research without treating it as something to follow blindly.
You do not weaken science by looking at it critically. You strengthen it. Science becomes more trustworthy when we invite this kind of reflection. That may be the central lesson of this meta-review: good research does not begin with certainty, but with curiosity and a willingness to see where things can improve. Every day.
Image created with ChatGPT.
[…] the effect of physical activity on cognition. But there is something else I need to address first. One of the most-read pieces on my blog at the moment asks a simple question: how reliable are most m… In essence, that post is a tour of all the ways things can go wrong. And often do. I received many […]