One of the most visited posts on this blog deals with the comparison of educational systems. One of the most often used ways to compare education in different countries and regions is looking at international programs such as PISA, PIRLS, TIMMS,… But there has also been critiques, even open letters.
I found this report via @dylanwilliam and in it Martin Carnoy examines four main critiques of how international tests results are used in policymaking. Of particular interest are these 4 critiques of the policy analyses published by the Program for International Student Assessment (PISA):
- Critique #1: Whereas the explicit purpose of ranking countries by average test score is to allow for inferences about the quality of national educational systems, the ranking is misleading because the samples of students in different countries have different levels of family academic resources (FAR).
- Critique #2: Students in a number of countries, including the United States, havemade large adjusted gains on the Trends in International Mathematics and Science Study (TIMSS) test 1999 – 2011, administered by the International Association for the Evaluation of Educational Achievement (IEA). However, they have shown much smaller, or no, gains on the FAR-adjusted PISA test. This raises issues about whether one test or the other is a more valid measure of student knowledge.
- Critique #3: The error terms of the test scores are considerably larger than the testing agencies care to admit. As a result, the international country rankings are much more in “flux” than they appear.
- Critique #4: The OECD has repeatedly held up Shanghai students and the Shanghai educational system as a model for the rest of the world and as representative of China, yet the sample is not representative even of the Shanghai 15-year-old population and certainly not of China. In addition, Shanghai schools systematically exclude migrant youth. These issues should have kept Shanghai scores out of any OECD comparison group and raise serious questions about the OECD’s brand as an international testing agency.
In his report Carnoy actually agrees with much of these critiques and he is quite clear about it, e.g.:
In sum, cross-sectional surveys such as the TIMSS and PISA are not amenable to estimating the causal effects of school inputs on student achievement gains. Neither follows individual students over time as they proceed through the school system, so we cannot tell how the gains in each grade are related to the school resources the student faced in that grade. The PISA survey is an even worse candidate than the TIMSS for drawing conclusions about which educational factors contribute to higher student scores. PISA is not a classroom-based survey such as the TIMSS, so no connection can be made between the student taking the PISA test and his or her teacher. Further, PISA does not survey teachers, so no data are available on their characteristics or their teaching methods. PISA asks students about the kind of teaching and curriculum they experienced, but these data are not related to a particular year of school.
One could argue that there is such a thing as TALIS, The OECD Teaching and Learning International Survey, but that doesn’t really answer this critique imho and I too had my doubts with some of the conclusions drawn by e.g. Andreas Schleicher in this TED-talk. In his talk he compares Luxembourg and Korea:
“One way you can spend money is by paying teachers well, and you can see Korea investing a lotin attracting the best people into the teaching profession. And Korea also invests into long school days,which drives up costs further. Last but not least, Koreans want their teachers not only to teach but also to develop. They invest in professional development and collaboration and many other things. All that costs money. How can Korea afford all of this? The answer is, students in Korea learn in large classes. This is the blue bar which is driving costs down. You go to the next country on the list, Luxembourg, and you can see the red dot is exactly where it is for Korea, so Luxembourg spends the same per student as Korea does. But, you know, parents and teachers and policymakers in Luxembourg all like small classes.You know, it’s very pleasant to walk into a small class. So they have invested all their money into there,and the blue bar, class size, is driving costs up. But even Luxembourg can spend its money only once,and the price for this is that teachers are not paid particularly well. Students don’t have long hours of learning. And basically, teachers have little time to do anything else than teaching. So you can see two countries spent their money very differently, and actually how they spent their money matters a lot more than how much they invest in education.”
Ehm, I do think Korea does have some issues with its private tutoring industry, no really. Not taking this into account makes this comparison – at least – incomplete. An incompleteness that will always be the case when comparing educational systems and that shouldn’t be a problem if you acknowledges this and if you aren’t too firm in the conclusions.
I do think Carnoy is too negative in this report as he writes:
Our review of the critiques of the claims surrounding international tests and the future prospects for countries that do well or poorly on these tests suggests that much of the international testing enterprise and its ideological influence has the substance of a house of cards.
I want to repeat myself on international tests:
Comparative educational research such as PISA, but also lesser known research such as TIMMS and PIRLS deliver very important scientific resources to inspire policy, not dictate it. It’s like throwing away the thermometer instead of fixing the fever.
The fever not being PISA itself, but the big influence it has become, playing way above it’s real merit. Policy makers should look at PISA, but also on all other resources, such as the ones I collected here.
I do agree with Carnoy when he concludes:
Unfortunately, all the valid critiques of international testing…are not going to make those tests go away.
So it’s a better option to make them a correct part of the information, nothing more, not less.
From the press release:
Using average PISA scores as a comparative measure of student achievement is misleading for a number of reasons, Carnoy maintains:
- Students in different countries have different levels of family academic resources;
- The larger gains reported on the TIMSS, which is adjusted for different levels of family academic resources, raise questions about the validity of the PISA results when used for international comparisons.
- PISA test score error terms are “considerably larger” than the testing agencies acknowledge, making the country rankings unstable.
- The Shanghai educational system is held up as a model for the rest of the world on the basis of non-representative data.
Of further concern is the conflict of interest arising from the Organization for Economic Cooperation and Development (which administers the PISA) and its member governments acting as a testing agency while simultaneously serving as data analyst and interpreter of results for policy purposes.
Carnoy considers the critiques within a discussion of the underlying social meaning and education policy value of international comparisons in general. He describes why using average national math scores as predictors of future economic growth is problematic, and points out that using scoring data in this manner has limited use for establishing education policy because causal inferences can not be meaningfully drawn.
Finally, Carnoy explores the relevance of nation-level test score comparisons among countries such as the United States with diverse and complex education systems. The differences between states in the U.S. are, for example, so large that employing U.S. state-level test results over time to examine the impact of education policies would be more useful and interesting than using combined U.S. data.
Despite valid critiques of international test result comparisons, Carnoy argues that the comparisons will neither go away nor stop being inappropriately used to shape educational policy. He concludes with five policy recommendations to reduce the misuse of testing data.