Research Article: Mitigating gender bias in student evaluations of teaching

Date Published: May 15, 2019

Publisher: Public Library of Science

Author(s): David A. M. Peterson, Lori A. Biederman, David Andersen, Tessa M. Ditonto, Kevin Roe, Rick K. Wilson.


Student evaluations of teaching are widely believed to contain gender bias. In this study, we conduct a randomized experiment with the student evaluations of teaching in four classes with large enrollments, two taught by male instructors and two taught by female instructors. In each of the courses, students were randomly assigned to either receive the standard evaluation instrument or the same instrument with language intended to reduce gender bias. Students in the anti-bias language condition had significantly higher rankings of female instructors than students in the standard treatment. There were no differences between treatment groups for male instructors. These results indicate that a relatively simple intervention in language can potentially mitigate gender bias in student evaluation of teaching.

Partial Text

Student evaluations of teaching (SET) are a ubiquitous form of evaluation. At many colleges and universities, student evaluations are the primary data used to evaluate teaching effectiveness and contributes to tenure and promotion packages [1]. However, there is a growing literature in multiple disciplines that documents bias in these evaluations, including gender bias [2–7]. In particular, female instructors tend to be evaluated more critically than their male peers, even when there are no differences in the quality of instruction or when the gender of the instructor is experimentally and randomly assigned [5]. These biases go as far as influencing even objective assessments of instructors (e.g. speed in the return of grades) [5]. Across a range of indicators, the experimental research shows that gender bias is approximately 0.50 points on a five-point scale [5]. The potential for gender biases in SET leads many academics to question their use and to a growing conversation about the need for alternative mechanisms for evaluating instructors [8].

The belief that SETs contain biases against female instructors is widespread, even if the empirical evidence is decidedly mixed [9, 10]. Both observational and experimental evidence does provide some evidence that suggests students do harbor some gender biases that filter into their SET [2–7, 9, 10]. These negative biases are likely to be implicit, meaning they are automatically activated, unintentional, and occur below the conscious awareness of the individual [11]. The difficulty is that many other confounding effects tend to correlate with the gender of the instructor. For example, female instructors may anticipate the bias and compensate by working harder and being better instructors than their male colleagues.

To test these hypotheses, we conducted an experiment in four large introductory courses in Spring of 2018: two introduction to biology courses and two introduction to American politics courses. Replication data are available at the lead author’s Dataverse page ( The experiment received approval from the Institutional Review Board at Iowa State University (#18–183). The Board ruled that the study was exempt from full review because it was conducted in an established or commonly accepted educational setting. Additionally, the research did not need to acquire informed consent and did not require parental consent if any of the students were under 18. Within each pair of courses, one section was taught by a female instructor and one section was taught by a male instructor. All four instructors are White. At this university, SETs are conducted online and students receive an email with the link to the evaluation of their instructor for each course. For each course, we randomized the students into one of two conditions. In the control condition, students received the standard SET survey for their department. In the treatment condition, the solicitation and the evaluation instrument used language that we expected to mitigate gender biases. The added language was:

Before testing the hypotheses, we evaluated the balance of the conditions for the other items included in the survey and there was no evidence of imbalance (see S1 Table). The proportion of male students and the mean levels of the students’ class in school, the students’ expected grade in the class, and their GPA were the same between the treatment and the control conditions. Furthermore, if we separate the data based on the gender of the instructor, these results do not change. The randomization appears to be successful.

The evidence from our experiment with SET suggests that a simple intervention informing students of the potential for gender biases can have significant effects on the evaluation of female instructors. These effects were consistent across two different introductory courses (one biology and one political science). These effects were substantial in magnitude; as much as half a point on a five-point scale. This effect is comparable with the effect size due to gender bias found in the literature [5]. There is no evidence of a similar effect on the evaluation of male instructors. Given the outsized role SET play in the evaluation, hiring, and promotion of faculty the possibility of mitigating this amount of possible bias in evaluations is striking.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments