Grace Snodgrass at Huffington Post:
One day soon, my name and performance evaluation could be printed in your morning newspaper. It will tell you that I’m a teacher who has clear strengths and weaknesses in helping my students advance academically.
But as valuable as my so-called “Teacher Data Report” is in helping me identify these areas, it really doesn’t say much about the overall quality of my teaching. And printing the results — as an NYC judge just gave the city the right to do — will do little to make me, or any of my colleagues, better teachers. At least, not right away. What will help is the Department of Education and the teachers’ union putting aside their differences and improving these reports so that teachers like me receive good information about our performance and clear steps towards achieving our classroom goals.
As an educator, I want to be evaluated. I know that my students’ success hinges on the quality of my teaching. The Department of Education is actually on the right track with the “value-added” method it uses to calculate the impact teachers have on their students’ academic growth. Value-added compares a student’s predicted performance on standardized assessments with how he or she actually performs.
Dana Goldstein and Megan McArdle on Bloggingheads
Jim Manzi at The Corner:
Recently, Megan McArdle and Dana Goldstein had a very interesting Bloggingheads discussion that was mostly about teacher evaluations. They referenced some widely discussed attempts to evaluate teacher performance using what is called “value-added.” This is a very hot topic in education right now. Roughly speaking, it refers to evaluating teacher performance by measuring the average change in standardized test scores for the students in a given teacher’s class from the beginning of the year to the end of the year, rather than simply measuring their scores. The rationale is that this is an effective way to adjust for different teachers being confronted with students of differing abilities and environments.
This seems like a broadly sensible idea as far as it goes, but consider that the real formula for calculating such a score in a typical teacher value-added evaluation system is not “Average math + reading score at end of year – average math reading score at beginning of year,” but rather a very involved regression equation. What this reflects is real complexity, which has a number of sources. First, at the most basic level, teaching is an inherently complex activity. Second, differences between students are not unvarying across time and subject matter. How do we know that Johnny, who was 20 percent better at learning math than Betty in 3rd grade is not relatively more or less advantaged in learning reading in fourth grade? Third, an individual person-year of classroom education is executed as part of a collective enterprise with shared contributions. Teacher X had special needs assistant 1 work with her class, and teacher Y had special needs assistant 2 working with his class — how do we disentangle the effects of the teacher versus the special ed assistant? Fourth, teaching has effects that continue beyond that school year. For example, how do we know if teacher X got a great gain in scores for students in third grade by using techniques that made them less prepared for fourth grade, or vice versa for teacher Y? The argument behind complicated evaluation scoring systems is that they untangle this complexity sufficiently to measure teacher performance with imperfect but tolerable accuracy.
Any successful company that I have ever seen employs some kind of a serious system for evaluating and rewarding / punishing employee performance. But if we think of teaching in these terms — as a job like many others, rather than some sui generis activity — then I think that the hopes put forward for such a system by its advocates are somewhat overblown.
There are some job categories that have a set of characteristics that lend themselves to these kinds of quantitative “value added” evaluations. Typically, they have hundreds or thousands of employees in a common job classification operating in separated local environments without moment-to-moment supervision; the differences in these environments make simple output comparisons unfair; the job is reasonably complex; and, often the performance of any one person will have some indirect, but material, influence on the performance of others over time. Think of trying to manage an industrial sales force of 2,000 salespeople, or the store managers for a chain of 1,000 retail outlets. There is a natural tendency in such situations for analytical headquarters types to say “Look, we need some way to measure performance in each store / territory / office, so let’s build a model that adjusts for inherent differences, and then do evaluations on these adjusted scores.”
I’ve seen a number of such analytically-driven evaluation efforts up close. They usually fail. By far the most common result that I have seen is that operational managers muscle through use of this tool in the first year of evaluations, and then give up on it by year two in the face of open revolt by the evaluated employees. This revolt is based partially on veiled self-interest (no matter what they say in response to surveys, most people resist being held objectively accountable for results), but is also partially based on the inability of the system designers to meet the legitimate challenges raised by the employees.
Noah Millman at The American Scene:
I do want to add a few additional points of my own:
1. Evaluations establish the principle that there is such a thing as performance in the first place. A great deal of discussion nowadays in education revolves around the idea that what we need to “fix the schools” is great teachers. But if that’s what we need, we’ll never do it. What we need, instead, are mechanisms for getting marginally better performance, year after year, from a teaching pool that remains merely adequate.
One bit of low-hanging fruit for achieving that goal, meanwhile, is the ability to dismiss the bottom 5% of teachers in terms of performance. Not only are these teachers failing comprehensively in their own classrooms, but their mere presence has a corrosive effect on an entire organization – on the teachers, on the students, on the management of the school. But right now, firing these teachers is essentially impossible. For all the difficulty of doing a rigorous evaluation in order to improve teaching performance across the board, I suspect it is a whole lot easier to identify the worst teachers in the school. If that could be done, the pressure to be able to terminate them would be significant, and that could do a lot to improve school performance right there.
2. Value-added metrics wind up punishing perfectly good but not spectacular schools with above-average student bodies. It may be that these schools should suffer reputationally, because the staff is not actually delivering as much value as they should. But high-stakes standardized testing actually pushes these schools to destroy themselves, wiping out the programs that actually do deliver value to these high-aptitude students and instead focusing on teaching to the tests.
That’s not an argument against using value-added metrics as such. It’s an argument that they need to be used intelligently, with some understanding of what “value-added” means at different points on the performance spectrum. But that, in turn, would require admitting that different standards are needed for students with different aptitude, which, in turn, is extremely difficult for our education system to admit. (And, admittedly, it’s a problem in corporate cultures that cross widely different customer bases as well. How well would Wal-Mart manage Tiffany?)
3. Nobody goes into teaching “for the money” – that is to say, teachers in aggregate make significantly less than people with their educational credentials and academic aptitude could make in other professions. So monetary rewards are useful primarily going to prove useful as signaling devices. There’s a lot of evidence coming in from high-performance charter schools suggesting that a monetary reward system tied too closely to evaluations actually degrades performance, because it gets teachers focused on the evaluations rather than on the performance. The evaluations should primarily be used as a diagnostic, to identify correctable deficiencies in teacher performance so they can be corrected through staff development, and to identify gross deficiencies in teacher performance so the teachers in question can be dismissed.
4. Similarly, across a system, what evaluations are useful is for research purposes and to drive market discipline. Evaluations of a school should be very useful to parents seeking to select a school for their child. Schools that consistently achieve high valuations (particularly for value-added metrics) should be objects of study by administrators and others looking to replicate that performance in lower-performing but still basically well-run schools. The least-important use of the evaluation is to directly “reward” or “punish” a school bureaucratically – and, indeed, if that becomes the primary use then the school is likely to start focusing overwhelmingly on the evaluation process and lose sight of actual performance. I’ve seen this happen over and over in New York City schools; it’s not a theoretical question.
Conor Friedersdorf at Sullivan’s place:
And it helps explain the inherent tension between teachers unions and the rest of us. Unions exist to protect the interests of their members. Even in the best case scenario, that means lobbying for an evaluation system that maximizes fairness to the people being evaluated. As citizens, our primary goal should be creating the best education system possible, even if doing so sometimes means (for example) that the teacher most desserving of a bonus doesn’t get one. Saying that there is a conflict between the common good and the ends of teachers unions isn’t a condemnation of the latter. It’s just a fact. And everyone seems to understand the basic concept if you talk about prison guard unions.
Part of what makes me nervous is that productivity varies dramatically within industries. It is very common for comparable factories at the 90th percentile produce four times as much as factories at the 10th percentile. Moreover, the scorecards and shortcuts used by factories at the 90th percentile wouldn’t necessarily work for those at the 10th percentile. Managerial insights are usually embedded in a complex tangle on personalities and practices that can’t easily be replicated. This is natural, and I’d say that I’d much rather see a few firms race ahead than allow all firms to remain mired at the low end of the productivity spectrum. Suffice it to say, this is not the ethic that governs how we generally think about public schools.
In a time when at least half of the political spectrum is deeply troubled by inequality, i.e., by the fact that some firms, individuals, and households are racing far ahead of others, what at least some education reformers are saying is that we want to unleash a few inventive, well-managed schools to start deploying the same per pupil resources to much greater effect. That is, we want to, in the short run at least, make the K-12 educational landscape more unequal, in the hope that leading schools will identify instructional methods, e.g., effective virtual instruction, that will prove scalable.
Much depends on how one interprets the fact that some firms, individuals, and households are racing ahead of the others. I take what I think of as a nuanced view. Generally speaking, some firms, individuals, and households race ahead of others due to a combination of luck, opportunity, and smart investments in organizational capital. In some cases, we see rent-seeking, tax and regulatory arbitrage, etc. But whereas Simon Johnson and many of my friends on the left see this as the dominant narrative, I see it as a significant but nevertheless relatively small part of the wage dispersion story.
Nicholas Bloom and John Van Reenen have written a neat essay in the Journal of Economic Perspectives on how effective management practices spread. I was struck by many of their observations, including some that will be familiar to those of you who see organizational capital as very important (“firms that more intensively use human capital, as measured by more educated workers, tend to have much better management practices”).
The United States has a commanding lead in terms of the quality of management in firms. This is very interesting considering our relative weakness in terms of educational attainment at the median in the prime-age cohorts. And I suspect that this feeds back into wage dispersion as well as assortative mating, family breakdown, and other sources of “stickiness” at the low end of the income distribution. For a variety of reasons, our economy is rewarding people with managerial skills, and, in a crude sense, one might be able to extrapolate the ability to manage a wide range of tasks in the workplace to the ability to maintain constructive relationships in other domains. The obvious objection is that many hard-charging executives neglect their families and personal lives, etc. But it could also be true that the that neglect of parental responsibilities is somewhat more common among those marginally attached to the labor force, due to the greater prevalence of substance abuse and other risky behaviors.
Jonathan Chait at TNR on Manzi:
That’s an interesting insight into the general problem with quantitative measures. Here are a few points in response:
1. You need some system for deciding how to compensate teachers. Merit pay may not be perfect, but tenure plus single-track longevity-based pay is really, really imperfect. Manzi doesn’t say that better systems for measuring teachers are futile, but he’s a little too fatalistic about their potential to improve upon a very badly designed status quo.
2. Manzi’s description…
evaluating teacher performance by measuring the average change in standardized test scores for the students in a given teacher’s class from the beginning of the year to the end of the year, rather than simply measuring their scores. The rationale is that this is an effective way to adjust for different teachers being confronted with students of differing abilities and environments.
..implies that quantitative measures are being used as the entire system to evaluate teachers. In fact, no state uses such measures for any more than half of the evaluation. The other half involves subjective human evaluations.
3. In general, he’s fitting this issue into his “progressives are too optimistic about the potential to rationalize policy” frame. I think that frame is useful — indeed, of all the conservative perspectives on public policy, it’s probably the one liberals should take most seriously. But when you combine the fact that the status quo system is demonstrably terrible, that nobody is trying to devise a formula to control the entire teacher evaluation process, and that nobody is promising the “silver bullet” he assures us doesn’t exist, his argument has a bit of a straw man quality.
Manzi responds to Chait:
My post wasn’t about if we should use quantitative measures of improvement in their students’ standardized test scores as an element of how we evaluate, compensate, manage and retain teachers, but rather about how to do this.
Two of the key points that I tried to make are that the metrics themselves should likely be much simpler than those currently developed by economics PhDs, and that such an evaluation system is only likely to work if embedded within a program of management reform for schools and school systems. The bulk of the post was trying to explain why I believe these assertions to be true.
An additional point that I mentioned in passing is my skepticism that such management reform will really happen in the absence of market pressures on schools. Continuous management reform, sustained over decades, that gets organizations to take difficult and unpleasant actions with employees is very hard to achieve without them. There’s nothing magic about teachers or schools. The same problems with evaluation and other management issues that plague them arise in big companies all the time. It’s only the ugly reality of market discipline that keeps them in check.