The only video you’ll ever need to watch about Gluten

This is not really the kind of video I normally share on this blog, but I really like what the American Chemical Society and PBS Digital Studios do in their Reactions-video’s. A very nice example of science communication, imho.


What people are saying about my new book… The Ingredients for Great Teaching.

This new meta-analysis is making the news: direct instruction

Direct instruction is nothing new. There is over 50 years of research. But lately there is a new fever surrounding the approach originally constructed by Engelmann and Becker. If you examine the latest PISA-results, you can see that they are not that far off from the results of the biggest experiment in education ever: Project Follow Through.

But between those two datasets there has happened a lot more research on Direct instruction. This research has now been brought together in a new meta-analysis that has gained a lot of attention in my Twitter-timeline.

And the results are pretty clear:

Our results support earlier reviews of the DI effectiveness literature. The estimated effects were consistently positive. Most estimates would be considered medium to large using the criteria generally used in the psychological literature and substantially larger than the criterion of .25 typically used in education research (Tallmadge, 1977). Using the criteria recently suggested by Lipsey et al. (2012), 6 of the 10 baseline estimates and 8 of the 10 adjusted estimates in the reduced models would be considered huge. All but one of the remaining six estimates would be considered large. Only 1 of the 20 estimates, although positive, might be seen as educationally insignificant.

What does this mean? Well, that Direct Instruction seems to be working quite well for reading, math, spelling, language,…

But there is more:

Earlier literature had led us to expect that effect sizes would be larger when students had greater exposure to the programs, and this hypothesis was supported for most of the analyses involving academic subjects. Significantly stronger results appeared for the total group, reading, math, and spelling for students who began the programs in kindergarten; for the total group and reading for students who had more years of intervention; and for math students with more daily exposure. Although we had expected that effects could be lower at maintenance than immediately postintervention, the decline was significant in only two of the analyses (math and language) and not substantial in either. Similarly, while literature across the field of education has suggested that reported effects would be stronger in published than in unpublished sources (Polanin et al., 2016), we found no indication of this pattern.

Contrary to expectations, training and coaching of teachers significantly increased effects in only one analysis (language). We suggest that readers interpret this finding cautiously for we suspect that it reflects the crude nature of our measure—a simple dummy variable noting if teachers were reported as receiving any training or coaching.

Are there no nuances to be made? Well, yes, of course as with all analyses. The researchers went to a great length to examine the quality of the studies, but didn’t include these insights in their analysis. And the researchers also the size and heterogeneity of the samples used in their research.

For instance, we did not attempt to compare the results of each of the DI programs with specific other approaches. Nor did we examine outcomes in subdimensions within the various subject areas, such as differentiating reading fluency and comprehension. In addition, many of our measures were less precise than could be considered optimal. The studies differed, often substantially, in the nature and amount of information given.


Abstract of the meta-analysis by Stockard et al:

Quantitative mixed models were used to examine literature published from 1966 through 2016 on the effectiveness of Direct Instruction. Analyses were based on 328 studies involving 413 study designs and almost 4,000 effects. Results are reported for the total set and subareas regarding reading, math, language, spelling, and multiple or other academic subjects; ability measures; affective outcomes; teacher and parent views; and single-subject designs. All of the estimated effects were positive and all were statistically significant except results from metaregressions involving affective outcomes. Characteristics of the publications, methodology, and sample were not systematically related to effect estimates. Effects showed little decline during maintenance, and effects for academic subjects were greater when students had more exposure to the programs. Estimated effects were educationally significant, moderate to large when using the traditional psychological benchmarks, and similar in magnitude to effect sizes that reflect performance gaps between more and less advantaged students.


Interesting interview with John Hattie: “I’m a statistician, I’m not a theoretician”

This Hattie-interview by Hanne Knudsen is very interesting although I think some statisticians will frown if they read the quote used as title “I’m a statistician, I’m not a theoretician”.

It advice to read the full article, but this excerpt explains the quote more in dept:

…if I had known that it would go to an audience larger than just researchers, I probably would have had a whole lot more of theory in it. I had never dreamed it would catch on like this, so you are right, it is quite devoid of theory in terms of how it is written. I am a measurement researcher, I am a statistician, I am not a theoretician, so I haven’t written a lot of theory. But of course I have a very strong model of teaching. I have worked for many years with some of the more well-known people on the theoretical side of teaching around the world. But you are right, it did not come through, and sometimes I think I should write a book about teaching as a profession – which indeed I am doing right now. I get accused more of not taking into account sociology, and the background of kids. Of course I think that is very critical, but the book was never written for that kind of general notion. You are right, it is a criticism, and maybe I should write something more theoretical, but it is not really my strength. I do have some very strong theories on how theory works and the concept of teaching. It just didn’t come through in the book

An antidote to a paper warning you for wifi (and other examples of junk science)

I really like science, I like the self-correcting part of science even more.

Check this paper that was published by Sage and Burgio earlier this year:

Mobile phones and other wireless devices that produce electromagnetic fields (EMF) and pulsed radiofrequency radiation (RFR) are widely documented to cause potentially harmful health impacts that can be detrimental to young people. New epigenetic studies are profiled in this review to account for some neurodevelopmental and neurobehavioral changes due to exposure to wireless technologies. Symptoms of retarded memory, learning, cognition, attention, and behavioral problems have been reported in numerous studies and are similarly manifested in autism and attention deficit hyperactivity disorders, as a result of EMF and RFR exposures where both epigenetic drivers and genetic (DNA) damage are likely contributors. Technology benefits can be realized by adopting wired devices for education to avoid health risk and promote academic achievement.

Sounds pretty alarming, no? Should we worry? Well, no.

The respected journal Child Development recently published a commentary that attributed a number of negative health consequences to RF radiation, from cancer to infertility and even autism (Sage Burgio, 2017). It is our view that this piece has potential to cause serious harm and should never have been published. But how do we justify such damning verdict? In considering our responses, we
realized that this case raised more general issues about distinguishing scientically valid from invalid views when evaluating environmental impacts on physical and psychological health, and we offer here some more general guidelines for editors and reviewers who may be confronted with similar issues. As shown in Table 1, we identify seven questions that can be asked about causal claims, using the Sage and Burgio (2017) article to illustrate these.
That’s right David Grimes and Dorothy Bischop took a closer look to the alarming article, and well…

Abstract of the paper by David Grimes and Dorothy Bischop that can be downloaded here:

Exposure to nonionizing radiation used in wireless communication remains a contentious topic in the public mindwhile the overwhelming scientic evidence to date suggests that microwave and radio frequencies used in modern communications are safe, public apprehension remains considerable. A recent article in Child Development has caused concern by alleging a causative connection between nonionizing radiation and a host of conditions, including autism and cancer. This commentary outlines why these claims are devoid of merit, and why they should not have been given a scientic veneer of legitimacy. The commentary also outlines some hallmarks of potentially dubious science, with the hope that authors, reviewers, and editors might be better able to avoid suspect scientic claims.


How useful are twin studies for measuring heritability of educational achievement? More than one might think!

I posted already several studies that linked heritability to study success, check e.g. here and here. The way this kind of research is conducted is that the researchers look at twin studies, comparing monozygotic and dizygotic twins. But how useful is this approach? This new study wanted to examine precisely this by (re)examining Dutch data, and maybe surprising: very. Check this quote from the conclusion:

In the Netherlands, the results of a national educational achievement test, the Eindtoets basisonderwijs, partly determine the level of secondary education suitable for a child. Several twin studies have looked at the heritability of individual indifferences on this test, but due to self-selection bias and possible differences in singletons and twins, these results might not generalize to the general population of Dutch pupils. Here we determined the heritability of test scores using population-wide census data. For the estimation, pedigree-based mixed models were used, a method borrowed from the field of animal genetics. We found a heritability of 0.94. When corrected for several school-related covariates, this estimate dropped to 0.85. How does this fairly high heritability estimate compare to that based on twin studies?

A few studies have been done on the same phenotype in the same birth cohort of Dutch children. For example, Bartels et al. (2002) conducted a twin study of 1,495 Dutch twins from the NTR from the birth cohorts 1998–2001 on the sum scores of the same test investigated here at age 12 (Eindtoets basisonderwijs). They found that genetic influences explained 57% of the variance in test scores and environmental influences 43%. Twenty-seven percent of the environmental variance could be explained by common-environmental influences and 16% by unique-environmental influences. Schwabe et al. (2016) analyzed the sum scores of 990 Dutch twin pairs from a similar birth cohort (1997–2000) from the NTR but also investigated the effect of the sex of a twin and specific covariates (i.e., school denomination, pedagogical philosophy, school size). Similar to the findings of Bartels et al. (2002), the results suggested that differences in test scores can be explained mainly by genetic influences (66%). Interestingly, while the heritability estimate dropped from 0.94 to 0.85 in the census-based analysis, including covariates did not change the heritability estimate in the Schwabe et al. (2016) study. This might be explained by the lower statistical power of the Schwabe et al. (2016) study, leading also to a lower variance of the covariate distribution: For example, 74% of the twins followed regular education and the school’s denomination was Roman-Catholic for 31% of the twins.

Overall, the results of twin studies imply that individual differences in the scores on the Eindtoets Basisonderwijstest can be largely explained by genetic differences: Estimated heritability ranges from 60% (Bartels et al., 2002) up to 74% (de Zeeuw et al., 2016). Earlier research furthermore suggests that the finding of a high heritability can be generalized not only to the total score of the Eindtoets Basisonderwijs, but also to its subscales (see e.g., de Zeeuw et al., 2016Schwabe et al., 2017). When we compare these heritability estimates to the estimate of 85% in this study, we can conclude that the high estimates resulting from the twin method are not simply an artifact of self-selection or because of any important difference between twins and singletons. Twin-based heritability estimates are not inflated, since an estimate based on a sample from the entire population (including twins and singletons) is even higher.

Of course there are limitations to this study, as always, and also as always more research is needed, but this seems an important element to the discussion. (H/T @SteveStuWill)

Abstract of the study:

As for most phenotypes, the amount of variance in educational achievement explained by SNPs is lower than the amount of additive genetic variance estimated in twin studies. Twin-based estimates may however be biased because of self-selection and differences in cognitive ability between twins and the rest of the population. Here we compare twin registry based estimates with a census-based heritability estimate, sampling from the same Dutch birth cohort population and using the same standardized measure for educational achievement. Including important covariates (i.e., sex, migration status, school denomination, SES, and group size), we analyzed 893,127 scores from primary school children from the years 2008–2014. For genetic inference, we used pedigree information to construct an additive genetic relationship matrix. Corrected for the covariates, this resulted in an estimate of 85%, which is even higher than based on twin studies using the same cohort and same measure. We therefore conclude that the genetic variance not tagged by SNPs is not an artifact of the twin method itself.

Help me get this message to Mark Zuckerberg…

Dear Mark,

you wrote a new letter on your own network about philanthropy and you want technology to save American Education. First of all, great you want to help the world, but as an educational scientist you got me a bit worried.

I want to discuss this excerpt of your text:

To imagine what a future education system might look like, consider the research of Benjamin Bloom showing that if you take an average student and give them one-to-one tutoring, they will perform two standard deviations better than other students learning by conventional techniques. In other words, if a student is at the 50th percentile in their class and they receive effective one-on-one tutoring, they jump on average to the 98th percentile.

That suggests we need an education system where all students receive the equivalent of an expert one-on-one tutor. That is what we mean when we refer to “personalized learning”. Rather than having every student sit in a classroom and listen to a teacher explain the same material at the same pace in the same way regardless of a student’s strengths, learning style and interests, research shows students will perform better if they can learn at their own pace, based on their own interests, and in a style that fits them.
There are several things to say here. I could start talking about solutionism but I’m sure you know Morozov, I could tell you how this has failed in the past, but the most important thing is: you state that what you say is research based, mentioning someone who died in 1999 as if ed research hasn’t developed since.
Oh, and there is also this:

Benjamin Riley is very correct, dear Mark, and I would suggest to invite the man too. What you write, has been one of the most enduring myths in education. I don’t think you need the money, but if you want to earn 5000 dollars you could try this challenge to prove me and the scientific world incorrect. Until then, please don’t say it is backed by research. If you want to know more about myths about learning and education, I sure want to send you our book.

Have a great 2018,



Decades of evidence supports early childhood education (Best Evidence in Brief)

There is a new Best Evidence in Brief (they have a blog now too) and they share a new meta-analysis on a topic I once wrote a report about in Belgium: early childhood education. I remember finding mixed results in the different studies. And now:

A recent meta-analysis of almost 60 years’ worth of high-quality early childhood education (ECE) studies found that participating in ECE programs significantly reduced special education placement and grade retention, and lead to increased graduation rates.
Dana Charles McCoy and colleagues examined data from studies spanning 1960-2016. All had to meet strict inclusion criteria and address ECE’s effects on special education placement, grade retention, or dropout rates, yielding 22 studies. Seven were randomized controlled studies, four were quasi-experimental, and eleven used non-randomized assignment and compared groups who were equivalent at baseline.
Results showed statistically significant effects of ECE. Compared to students who did not attend ECE, participants were 8.1 percentage points less likely to be placed in special education, 8.3 percentage points less likely to be held back a grade, and 11.4 percentage points more likely to graduate high school.
Authors discuss how these results support the idea of expanding ECE programming in the U.S.


What are the effects of a method such as Montessori education?

I often get the question what is better: traditional schools or method schools – I would prefer to call them modern schools – such as Montessori, Jenaplan, Freinet, … Last week I explained to my students that this is something quite difficult to examine as many parents who opt for such a school are often already very much involved in the education of their children, an element that already can predict a lot of the gains at school. A recent review study on Montessori-education showed that a lot of the research suffers from this.
I now have two different studies that use the same trick to bypass this hurdle. They look at a lottery situation and compare the children who won the lottery – who were admitted to a Montessori school – with the children who weren’t but who’s parents equally thought this would be important.

The first study has some great news: the children seem to benefit from the approach, as Best Evidence in Brief reported:

A longitudinal study published in Frontiers in Psychology examined how children in Montessori schools changed over three years compared with children in other preschool settings.>The Montessori model involves both child-directed, freely-chosen activity and academic content. Angeline Lillard and colleagues compared educational outcomes for children allocated places by a random lottery to either Montessori preschools (n=70) or non-Montessori preschool settings (n=71) in Connecticut. The research team carried out a variety of assessments with the children over a three-year period, from when the children were three until they were six.

The researchers found that over time children in Montessori preschools performed better on measures of academic achievement (Woodcock-Johnson IIIR Tests of Achievement effect size = +0.41) and social understanding while enjoying their school work more, than those in conventional preschool settings. They also found that in Montessori classrooms, children from low-income families, who typically don’t perform as well in school, showed similar academic performance as children from higher-income families. Children with low executive function similarly performed as well as those with high executive function.
The findings, they suggest, indicate that well-implemented Montessori education could be a way to help disadvantaged children to achieve their academic potential.

The second study by Nienke Ruijs has the same research-approach, but… didn’t find any influence:


  • I use school admission lotteries to investigate the effects of Montessori education.
  • I find little evidence that Montessori education affects academic achievement.
  • Montessori students show similar levels of motivation.
  • Montessori students do not score better on measures of independence.

This is the abstract of the study:

This study investigates the causal effects of Montessori secondary education by exploiting admission lotteries in Dutch Montessori schools. Results from 308 to 625 students indicate that Montessori education provides an alternative way to attain similar outcomes. Montessori students obtain their secondary school degree without delay at the same rate and with similar grades as non-Montessori students, although the route towards the exams is somewhat different. Further, Montessori students show similar levels of motivation and do not score better on various measures of independence, even though these are the main characteristics Montessori education claims to foster.

So, now what? I do think that it is important to notice that while the first study looked at primary education, the second looked at secondary education, which could explain in part the difference. But maybe there could be also another element involved: a lot of schools label themselves Montessori, it doesn’t necessarily mean they are fully working to her vision even if they do use her materials or mention her vision. This seem to be more the case for secondary education as Nienke Ruijs notes:

Montessori secondary education programs are less standardized than Montessori primary education programs, and the results of this study should not be generalized to the small number of rather radical Montessori boarding schools with ‘Erdkinder’ programs. The schools considered in this study are, however, part of a long tradition of Montessori secondary education. Montessori secondary schools in other countries are based on the same philosophy and have similar approaches with respect to choice of activities and field projects.

Both studies have the benefit of having a randomized group, but at the same time they both risk to still be underpowered for strong conclusions.
(H/t to Daniel Willingham for tweeting the Dutch study)



Very interesting paper: Are the smartest people losing their intelligence?

This is a fascinating new paper by James Flynn (who’s name delivered The Flynn Effect) and Michael Shayer. The paper doesn’t present a smoking gun, but tries to see trends in countries who see a decline in average IQ and countries who don’t see the same decline. But they look deeper than averages on parts of the test, by also looking on which groups in the population are doing worse.

I can share with you the summarizing bullet points:

  • Important national differences, particularly the contrast between Scandinavia and elsewhere.
  • Dutch trends show that IQ gains vary by age which is indicative of the strength of various causal factors.
  • Piagetian trends provide information conventional tests do not: that the largest losses may be at the top of the curve.

Or even the abstract:

The IQ gains of the 20th century have faltered. Losses in Nordic nations after 1995 average at 6.85 IQ points when projected over thirty years. On Piagetian tests, Britain shows decimation among high scorers on three tests and overall losses on one. The US sustained its historic gain (0.3 points per year) through 2014. The Netherlands shows no change in preschoolers, mild losses at high school, and possible gains by adults. Australia and France offer weak evidence of losses at school and by adults respectively. German speakers show verbal gains and spatial losses among adults. South Korea, a latecomer to industrialization, is gaining at twice the historic US rate.

When a later cohort is compared to an earlier cohort, IQ trends vary dramatically by age. Piagetian trends indicate that a decimation of top scores may be accompanied by gains in cognitive ability below the median. They also reveal the existence of factors that have an atypical impact at high levels of cognitive competence. Scandinavian data from conventional tests confirm the decimation of top scorers but not factors of atypical impact. Piagetian tests may be more sensitive to detecting this phenomenon.

But honestly, there are so many nuances in the article that both these summaries don’t do justice. I think this 2 parts of the conclusion are the most important elements of the paper.

Excerpt 1:

Nations that are candidates for IQ decline give evidence that ranges from beyond dispute to fragmentary. The strongest comes from Scandinavia as a whole, Britain (Volume and Heaviness), and Germany (spatial). Nordic data cover vocabulary, similarities, analogies, shapes, metal folding, number series, geometrical figures, and letter matrices. The pivotal year for all of these nations seems to be about 1995. Data from France and Australia await further studies but cannot be dis- missed. The US is strange. All ages plow on unaffected as yet. Black gains are larger than white but white gains are still robust (Flynn, 2012a). The quick developing world (Korea) and the slow developing world will of course continue to gain for some time.

These trends come as no surprise. Flynn has argued that industrialization may eventually pay diminishing returns in developed nations. Until very recently, we have enjoyed a more favorable ratio of adults to children in the home, more and better schooling, more cognitively demanding jobs, and better health and conditions of the aged. These caused large IQ gains for several generations. However, the same factors can turn from positive to mixed or even negative. The number of children in the home has reached a minimum (and indeed there are more solo-parent homes), middle class parents have used up the tricks that make the pre-school environment cognitively enriching, we appear to have reached a limit in terms of enhanced schooling and the number we keep in school into adulthood, the economy may be producing fewer cognitively demanding jobs in favor of more service work; however, we may continue to improve the health of the aged.

Excerpt 2 (bold by me):

The Piagetian results are particularly ominous. Looming over all is their message that the pool of those who reach the top level of cognitive performance is being decimated: fewer and fewer people attain the formal level at which they can think in terms of abstractions and develop their capacity for deductive logic and systematic planning. They also reveal that something is actually targeting that level with special effect, rather than simply reducing its numbers in accord with losses over the curve as a whole. We have given our reason as to why the Piagetian tests are sensitive to this phenomenon in a way that conventional tests are not.

Massive IQ gains over time were never written in the sky as something eternal like the law of gravity. They are subject to every twist and turn of social evolution. If there is a decline, should we be too upset? During the 20th century, society escalated its skill demands and IQ rose. During the 21st century, if society reduces its skill demands, IQ will fall. Nonetheless, no one would welcome decay in the body politic, or among the elite who at present represent our best thinkers. Although it might be argued that the character of the electorate will be enhanced if it contained fewer lawyers and more plumbers and service workers.

It is always possible that our schools and universities will graduate more young people who read and become more critically astute. This in itself would put a limit on IQ losses on Vocabulary, Information, and most Verbal tests, and on accepting the stereotypes that cloud moral reasoning and political prudence.



1 Comment

