Important new meta-analysis on the testing effect – with some surprises…

There is a new Best Evidence in Brief and one of the studies the newsletter discusses is all about the effect of testing:

Olusola O. Adesope and colleagues conducted a meta-analysis to summarize the learning benefits of taking a practice test versus other forms of non-testing learning conditions, such as re-studying, practice, filler activities, or no presentation of the material.

Results from 272 independent effects from 188 separate experiments demonstrated that the use of practice tests is associated with a moderate, statistically significant weighted mean effect size compared to re-studying (+0.51) and a much larger weighted mean effect size (+ 0.93) when compared to filler or no activities.

In addition, the format, number, and frequency of practice tests make a difference for the learning benefits on a final test. Practice tests with a multiple-choice option have a larger weighted mean effect size (+0.70) than short-answer tests (+0.48). A single practice test prior to the final test is more effective than when students take several practice tests. However, the timing should be carefully considered. A gap of less than a day between the practice and final tests showed a smaller weighted effect size than when there is a gap of one to six days (+0.56 and + 0.82, respectively).  

Are you as baffled by these results as I am? I checked in the original article to find out more.

Eg. MC-questions having a bigger effect size – while remembering is often harder than recognizing? Well, there are some studies who actually support the latter:

For instance, Kang et al. (2007) revealed that students who took a short-answer practice test outperformed students who took a multiple-choice practice test on the final test, regardless of whether the final test was short-answer or multiple-choice.

But:

On the other hand, C. D. Morris, Bransford, and Franks’s (1977) research on levels of processing suggests that retention is strongest when processing demands are less demanding. They reason that this is because less demanding retrieval practice activities allow participants to focus all of their cognitive energy on a simple task at hand, whereas deeper levels of processing require more cognitive energy and can distract participants from relevant aspects (C. D. Morris et al., 1977).

And looking at the meta-analysis, the second theory seems to be winning as “the differences between multiple-choice and short-answer practice test formats did emerge: g = 0.70 and g = 0.48, respectively” But it’s worth noting that the researchers do warn it’s not that simple:

…we found that multiple-choice testing was the most effective format; however, this should be interpreted with caution, since an educator’s decision to use any given format should be based on the content of the learning material and the expected learning outcomes. For example, multiple- choice tests may be especially useful for memorization and fact retention, while short-answer testing may require more higher order thinking skills that are useful for more conceptual and abstract learning content

And what about a single test being more effective than taking several practice tests? The meta-analysis does support this, but Adesope et al. can only guess why:

Thus, our findings suggest that although a single test prior to a final test may result in better performance, the timing of the test should be carefully considered. One plausible explanation is more time between the practice and final tests allows students to mentally recall and process information, leading to deeper learning. An alternative hypothesis is that multiple tests within a short time may result in test fatigue that affects performance, while retrieval practice over a distributed time period enables long-term storage.

I do think that this meta-analysis will invite other researcher do join the debate…

Abstract of the study:

The testing effect is a well-known concept referring to gains in learning and retention that can occur when students take a practice test on studied material before taking a final test on the same material. Research demonstrates that students who take practice tests often outperform students in nontesting learning conditions such as restudying, practice, filler activities, or no presentation of the material. However, evidence-based meta-analysis is needed to develop a comprehensive understanding of the conditions under which practice tests enhance or inhibit learning. This meta-analysis fills this gap by examining the effects of practice tests versus nontesting learning conditions. Results reveal that practice tests are more beneficial for learning than restudying and all other comparison conditions. Mean effect sizes were moderated by the features of practice tests, participant and study characteristics, outcome constructs, and methodological features of the studies. Findings may guide the use of practice tests to advance student learning, and inform students, teachers, researchers, and policymakers. This article concludes with the theoretical and practical implications of the meta-analysis.

Advertisements

2 Comments

Filed under Education, Research, Review

2 responses to “Important new meta-analysis on the testing effect – with some surprises…

  1. Fred Flint

    Wow, this continues to promote a really poor understanding of effect size. A recent paper by Simpson (https://tinyurl.com/zhzbnwx) shows that effect size is really a measure of experimental clarity, not educational importance. Saying “Practice tests with a multiple-choice option have a larger weighted mean effect size (+0.70) than short-answer tests (+0.48)” may simply be saying “Experiments which used tests with a multiple-choice option tended to be more precise than experiments which used short answer tests”. A very neat little example of this is given by Jan Van Hove on his blog (http://janhove.github.io/design/2015/03/16/standardised-es-revisited).

    It should also be noted that the Adesope paper is a particularly bad example of a meta-analysis: it continues to use Fail-safe N as a test for the potential impact of publication bias despite long being known to be a huge overestimate (see Fergusson & Heene, 2012).

    Educators really shouldn’t be relying on such poor ‘evidence’

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s