More and more countries live by the holy grail that PISA seems to be and now OECD introduces the opportunity to use the tests for your own schools. But their are also other voices who have second thoughts about PISA and the methodology being used.
I found this article through @MartinDogger with the provocative title that PISA-rankings are utterly wrong. From the article:
Dr Hugh Morrison, from Queens University Belfast in Northern Ireland, goes further, saying that the model Pisa uses to calculate the rankings is, on its own terms, “utterly wrong” because it contains a “profound” conceptual error. For this reason, the mathematician claims, “Pisa will never work”.
The academics’ papers have serious implications for politicians, including England’s education secretary Michael Gove, who justified his sweeping reforms by stating that England “plummeted” down the Pisa rankings between 2000 and 2009.
The questions used for Pisa vary between countries and between students participating in the same assessment. In Pisa 2006, for example, half the students were not asked any reading questions but were allocated “plausible” reading scores to help calculate their countries’ rankings.
Another academic mentioned in the article is Svend Kreiner who already wrote a critique in 2011 but just published a new research paper aiming to disclose an important flaw in the PISA-methodology. Pisa uses the Rasch model, a statistical way of “scaling” up the results it does have. But Professor Kreiner says this model can only work if the questions that Pisa uses are of the same level of difficulty for each of the participating countries. He believes his research proves that this is not the case, and therefore the comparisons that Pisa makes between countries are “useless”.
The abstract of the new research paper:
This paper addresses methodological issues that concern the scaling model used in the international comparison of student attainment in the Programme for International Student Attainment (PISA), specifically with reference to whether PISA’s ranking of countries is confounded by model misfit and differential item functioning (DIF). To determine this, we reanalyzed the publicly accessible data on reading skills from the 2006 PISA survey. We also examined whether the ranking of countries is robust in relation to the errors of the scaling model. This was done by studying invariance across subscales, and by comparing ranks based on the scaling model and ranks based on models where some of the flaws of PISA’s scaling model are taken into account. Our analyses provide strong evidence of misfit of the PISA scaling model and very strong evidence of DIF. These findings do not support the claims that the country rankings reported by PISA are robust.
In the TES-article there is also an answer by PISA-officials:
“Large variation in single ranking positions is likely, particularly among the group of countries that are clustered in the middle of the distribution, as the scores are similar,” the organisation said. It claimed that country rankings take account of “the uncertainty that results from sample data” and are “therefore shown in the form of ranges rather than single ranking positions”.
These ranges are given in Pisa results but not in the main tables. And although separate rankings are produced for England, no ranges are published for the country.
The OECD said Pisa questions were “tested to ensure they have the same relative difficulty across countries”, but it has admitted to TES that some variation remains after this testing. On the suitability of Rasch, it said “no model exactly fits the data”.
“Examination of alternative models has shown that the outcomes are identical,” it added. “However, Pisa will always seek to improve and learn from the latest research methods.”
Comparative education is a very interesting but very difficult field of research. Comparing education in different countries is by definition almost impossible. Just a small example. One of the best countries in typical rankings is Korea. This country was even used in a recent TED-talk as an example to explain why it’s better to go for bigger classes so you can invest in teacher development. Sounds good, but… as the OECD acknowledges there is something quite peculiar in Korea: the role of hagwons and other private tutoring institutions:
“After-school education has been a major factor behind the excellent performance of Korean students in international tests, such as PISA (Koh et al., 2010). In 2010, around three-quarters of students participated in such courses.”
So, it’s not only about the question if you can compare a reading test Chinese with a Danish reading test, it’s always even more difficult.
So, should one trust PISA-rankings? Well, I for my part don’t think that the people working on PISA are not trustworthy. The tests are and still can be a very interesting means of comparison, but PISA isn’t a holy grail. It’s mere a starting point for further research. It’s one of the biggest data collections and this in itself makes it relevant, but as often mentioned even by Pasi Salhberg it can never deliver a simple recipe for copy past policies from one region to another.