While many countries are now discussing the results of PISA 2015 and educators discuss graphs such as these, there is also another discussion worth of your attention: the OECD made a big change: most countries went from paper-based evaluation to computer-based evaluation. We’ve seen before that paper versus computer can make some difference, but the OECD-researchers did do pilots to measure the impact. But are the results clearcut? Not really, let’s check this long excerpt (bold by me):
Despite the attention given to ensuring comparability of test results across modes, it was not possible – nor desired – to adjust the scaling of results to take country differences in familiarity with computer tools, or in student motivation to take the PISA test on computer, into account. Indeed, PISA aims to measure student performance in different countries against a common, but evolving, benchmark – one that includes the ability to use today’s tools for solving problems in the different subjects assessed.
But is there any evidence that changes in a country’s/economy’s mean score reflect differences across countries/ economies in students’ familiarity with ICT?
The field trial for PISA 2015 provides a partial, negative answer to this question: in no country/economy that participated in the mode-effect study did the difference between students’ results on the computer- and paper-based tests deviate significantly from the average between-country difference, which was set to zero in the scaled results (see Annex A6).
However, because the national field-trial samples were small, only large differences in performance between students who were given the computer-based version of the test and an equivalent group of students, selected through random assignment, who were given the paper-based version of the test could be detected. It was not possible to rule out small and moderate effects of the mode of delivery on the mean performance of countries/ economies.
Correlational analyses corroborate the conclusion that changes in the mode of delivery are, at best, only a partial explanation for changes in performance between PISA 2012 and PISA 2015 that are observed in countries that conducted the 2012 test on paper and the 2015 test on computer. Figure I.5.6 shows shows the relationship between a simple indicator of familiarity with ICT that is available for all countries participating in PISA 2012 (the share of students who reported, in PISA 2012, having “three or more” computers in their homes; on average across OECD countries, 43% of students so reported) and the difference in mathematics performance between the PISA 2012 and the PISA 2015 assessments, for countries that conducted PISA 2015 on computer. Across all countries and economies, greater exposure to ICT devices in the home explains, at best, only 4% of the variation in the difference between PISA 2012 and 2015 scores (correlation: 0.21).1 After excluding two countries that show both greater exposure and significant and positive trends (Denmark and Norway), the correlation between these two measures is only 0.10 across the remaining countries/economies. This means that in Denmark and Norway, students’ greater familiarity with ICT (or, perhaps, greater motivation to take a test delivered on computer rather than one delivered on paper) could be part of the observed improvement in performance.
But in general, countries where students have greater familiarity with ICT tools are almost equally likely to observe positive and negative trends, as are countries where students have less familiarity with ICT.
For 38 countries and economies, a more specific indicator of familiarity with ICT tools for mathematics is also available, through the optional ICT questionnaire for students that was distributed in PISA 2012. Students were asked to report whether they use computers during mathematics lessons for specific tasks, such as drawing the graph of a function or calculating with numbers. The share of students who reported doing at least one of these tasks on computer during mathematics lessons in the month prior to the PISA 2012 test correlates positively with the difference in mathematics performance between PISA 2012 and PISA 2015 in these 38 countries and economies (correlation 0.48). But clearly, not all changes in performance can be explained by the use of ICT tools in mathematics lessons. An improvement in mathematics performance was observed in Slovenia, for instance, despite the fact that students reported only average levels of familiarity with ICT in the PISA 2012 survey. In Australia, a negative trend in performance between PISA 2012 and PISA 2015 was observed despite the fact that students in 2012 reported frequent use of ICT tools in mathematics lessons.
Another 30 countries and economies can also compare changes in performance between 2012 and 2015 with the difference in mean performance between the main, paper-based assessment of mathematics conducted in 2012, and an optional, computer-based assessment of mathematics. This second test was conducted among some of the same students who also sat the paper-based PISA test, often in the afternoon of the main testing day. Results were reported on the same mathematics scale as the results of the paper-based test (OECD, 2015b). The PISA 2015 mathematics test (both in its computer-based and in its paper-based versions) used only items that were developed originally for the paper-based test; it is therefore closer, in terms of the questions asked and in timing (as part of the main, two-hour test session) to the PISA 2012 paper-based test, even though it was conducted on computer.
The correlation of changes in mean mathematics performance between 2012 and 2015 with differences between the computer-based and the paper-based mathematics performance in 2012 is only 0.18 – signalling a weak association. This may imply that the aspects that are unique to the PISA 2012 computer-based assessment (the inclusion of items that explicitly measure students’ ability to use ICT tools for solving mathematics problems, and when the test was conducted) explain a bigger part of the performance differences in 2012 than how the test was delivered. It may also imply that changes in performance between 2012 and 2015 largely re ect other factors than the mode of delivery, such as changes in student proficiency, or the sampling variability and scaling changes that contribute to the uncertainty associated with trend estimates (the sampling error and link error; see Annex A5).
So, there can be both positive and negative effects but still there is an earlier report by Jerim who did warn for a possible big impact:
The Programme for International Assessment (PISA) is an important cross-national study of 15 year olds academic achievement. Although it has traditionally been conducted using paper-and-pencil tests, the vast majority of countries will use computer-based assessment from 2015. In this paper we consider how cross-country comparisons of children’s skills differ between paper and computer versions of the PISA mathematics test. Using data from PISA 2012, where more than 200,000 children from 32 economies completed both paper and computer versions of the mathematics assessment, we find important and interesting differences between the two sets of results. This includes a substantial drop of more than 50 PISA test points (half a standard deviation) in the average performance of children from Shanghai- China. Moreover, by considering children’s responses to particular test items, we show how differences are unlikely to be solely due to the interactive nature of certain computer test questions. The paper concludes with a discussion of what the findings imply for interpretation of PISA results in 2015 and beyond.
I do think that this is something that should be taken into account when discussing both declines and increases. I’m certainly not saying that this is the sole explanation, that would be crazy, but it’s difficult to exclude that this change in assessment doesn’t have any impact.