Commentary

Should Educator Evaluations Trust Data More?

Study raises questions about bias in judging teacher performance

Michigan lawmakers recently punted changes in educator evaluations to next year, with hopes that the state's education officials can iron out important details in the meantime. A new analysis shines a light on the challenges of adopting a truly effective policy.

Last week, the online news organization Chalkbeat reported on a new Michigan State University study that raises concerns about how fairly Michigan teachers have been rated in their professional evaluations. African Americans and men, who respectively make up 6% and 23% of the state's public school teacher workforce, were more likely to receive low performance ratings than their white and female peers.

Under current law, student growth as measured on state tests and other indicators determines 25% of a teacher's or administrator's evaluation. MSU researchers found no discernible differences among teachers by race on the more objective component of these evaluations, which suggests the discrepancies within schools were based on the judgments of evaluators in classroom observations.

The study covered the 2011-12 to 2015-16 school years, the first five years all educators were required to receive annual evaluations that resulted in one of four effectiveness ratings. Michigan lawmakers adopted changes as part of a national push to recognize and develop great teaching through more consistent, credible and meaningful evaluations of classroom performance. An influential 2009 report found that the typical pass/fail evaluation systems then in use typically rated over 99% of teachers' performance as satisfactory.

Evaluation reform moved the needle a little in Michigan, giving districts additional latitude to remove the worst teachers. The likelihood of receiving one of the two lowest ratings today nonetheless stands at less than 2%, after topping out near 3% during the time period studied.

Even so, MSU researchers recently found that male teachers were 40% more likely to get a low rating than their female counterparts. And the share of African American teachers with low ratings was more than three times greater than that of white teachers. Male teachers, meanwhile, were more likely to be marked down by female administrators, and African Americans were more likely to be marked down when working alongside fewer colleagues of the same race.

While the findings do not definitively prove bias, they do raise concerns both about the fairness and validity of classroom observations. According to state law, a district or charter school must dismiss teachers who receive the lowest rating on three consecutive annual evaluations. These cases have become vanishingly rare, however: For each teacher labeled ineffective in 2017-18, evaluators designated 141 to be highly effective. Only 282 out of nearly 100,000 teachers received the lowest mark. It's unclear how many may have earned automatic dismissal with three straight ineffective ratings.

Most of the unease surrounding educator evaluations has fixated on the use of data, not on human decision-making. With backing from a broad range of education groups, state lawmakers last month agreed to delay the scheduled increase of the weight that student growth plays in an educator’s evaluation. It was set to rise from 25% to 40%, but lawmakers delayed the change for one year. The net result of supporting Senate Bill 122 was to keep evaluators' judgment as the dominant factor.

Adjusting the weight of student growth data may prove even more contentious next year. House Education Committee chair Pamela Hornberger, R-Chesterfield Township, forcefully called on Department of Education officials to ensure that districts can make meaningful use of student growth data. Yet even if more reliable data were to lessen anxiety about possible bias in classroom observations, it won't soften union opposition to evaluation reform.

Over the last decade, it has become clearer that the work of improving educator evaluations is as difficult as it is necessary. Michigan has yet to achieve a workable system, even with a 2015 law that gives local districts some flexibility in using specific evaluation tools beyond what the state approves.

In the absence of a perfect system, policymakers must work to emphasize the importance of fostering student achievement while balancing concerns about local decision-making and flaws in human judgment. The MSU study adds another important dimension to that debate.