May 27, 2014 by patrick o'mahen

New research questions use of Value-Added-Measures to rate teachers

A new research paper has found little connection between teaching quality and improvement of standardized test scores.

At issue is the idea of value-added measures (VAM), evaluating teachers by seeing how much students improve on standardized test scores and using deviations in expected improvements to rate teachersDistricts highly rate teachers whose students outperform expected gains, while teachers whose students do not improve as much as projected receive lower ratings.

The idea of VAM is popular with some education reformers, like Michelle Rhee of Students First, who want to bring more stringent evaluation to classroom teachers, using tools developed in the private sector to rate employee performance.

However, Morgan Polikoff of the University of Southern California and Andrew C. Porter of the University of Pennsylvania have found very weak or zero connection between the value-added measures on tests and other measures of teacher quality.

“This study contributes to a growing literature suggesting state tests may not be up to the tasks of differentiating effective from ineffective (or aligned from misaligned) teaching.” The authors wrote.

The research appeared online in the journal Education Evaluation and Policy Analysis on May 13 under the title “Instructional Alignment as a measure of Teaching Quality.”

Polikoff and Porter analyzed 327 fourth and eighth-grade reading and math teachers from six large urban school districts (Dallas, Memphis, Hillsborough County Florida, Charlotte-Mecklenburg, Denver and New York City) in six different states participating in the Measures of Effective Teaching (MET) study. The researchers invited the teachers to join the study, and the sample of teachers in the study roughly matched each district’s demographic characteristics, as well proportion of special education students taught.

Teachers participating in the study filled out forms detailing the material they taught in their classes throughout the year. Coders then compared the teachers’ course materials to state standards and related how closely the two aligned. To measure teacher quality, Polikoff and Porter used two measures. First, they had the teachers videotape several lessons on the same topic and had teaching experts rate each teacher’s competence after viewing the videotape. Second, the teacher’s students sent in general evaluations of their teacher for the year. Researchers also gained access to the students’ VAM scores for each teacher.

Polikoff and Porter found no relationship between the expert evaluation of teachers and student evaluations on one hand, and VAM scores on the other.

“State tests aren’t picking up what we think of as good teaching,” Polikoff said in a video discussing the research results.

Additionally, teachers who aligned their instructional materials more closely with state standards only had a very weak, positive relationship with having students who scored better on tests as measured by VAM criteria in any of the districts.

The study was funded in part with a grant from the Bill and Melinda Gates Foundation, an organization which has invested heavily in education reform and has supported value-added measures for teacher evaluations.

The study comes at a time when many large school districts across the United States, encouraged by federal policy, have started using this methodology to evaluate their teachers. Locally, VAM plays a large role in the Houston Independent School District’s decisions to renew its teacher contracts and how much to pay teachers in bonuses. It is also the subject of a recent lawsuit filed by the Houston Federation of Teachers against the district in Federal Court.

Polikoff and Porter cautioned that policy makers and school district administrators should move extremely carefully if they choose to use VAM in teacher evaluation.

“Before moving forward with new high-stakes teacher evaluation policies based on multiple measures teacher evaluation systems, it is essential that the research community develops a better understanding of how state tests reflect differences in instructional content and quality,” the wrote in the study’s discussion section.

For more commentary on the study, see the Washington Monthly’s review of the article here.