Found this article by Colleen Flaherty and the mentioned study via Linda Duits. There is new evidence that – I would say again – suggests that student evaluations of teaching are not that reliable, more specific these SET’s are biased against female instructors “in so many ways that adjusting them for that bias is impossible”.
From the article:
Moreover, the paper says, gender biases about instructors — which vary by discipline, student gender and other factors — affect how students rate even supposedly objective practices, such as how quickly assignments are graded. And these biases can be large enough to cause more effective instructors to get lower teaching ratings than instructors who prove less effective by other measures, according to the study based on analyses of data sets from one French and one U.S. institution.
“In two very different universities and in a broad range of course topics, SET measure students’ gender biases better than they measure the instructor’s teaching effectiveness,” the paper says. “Overall, SET disadvantage female instructors. There is no evidence that this is the exception rather than the rule.”
Accordingly, the “onus should be on universities that rely on SET for employment decisions to provide convincing affirmative evidence that such reliance does not have disparate impact on women, underrepresented minorities, or other protected groups,” the paper says. Absent such specific evidence, “SET should not be used for personnel decisions.”
Abstract of the article “Student Evaluations of Teaching (Mostly) Do Not Measure Teaching Effectiveness”:
Student evaluations of teaching (SET) are widely used in academic personnel decisions as a measure of teaching effectiveness. We show:
- SET are biased against female instructors by an amount that is large and statistically significant
- the bias affects how students rate even putatively objective aspects of teaching, such as how promptly assignments are graded
- the bias varies by discipline and by student gender, among other things
- it is not possible to adjust for the bias, because it depends on so many factors
- SET are more sensitive to students’ gender bias and grade expectations than they are to teaching effectiveness
- gender biases can be large enough to cause more effective instructors to get lower SET than less effective instructors.
These findings are based on nonparametric statistical tests applied to two datasets: 23,001 SET of 379 instructors by 4,423 students in six mandatory first-year courses in a five-year natural experiment at a French university, and 43 SET for four sections of an online course in a randomized, controlled, blind experiment at a US university.