The New York Times released “value-added” data for 18,000 elementary school teachers in New York City recently, joining the Los Angeles Times, which released data for LA teachers last year. But unlike in LA, the New York Times allowed the data itself to be released, allowing analysts to really dive in and look at the performance assessments and whether or not they make sense. There’s been a raging argument about whether this “value-added” metric successfully measures teacher performance. And this release of data, when you actually break it down, doesn’t do the job.
Gary Rubinstein has been analyzing the data at his site, and he’s found some incredible results. First, he found that there’s almost no correlation among teachers year-over-year in the data. A teacher is likely to be judged as effective one year and ineffective the next. The average change in performance was a relatively large 25 points, and it did not fit with commonly accepted beliefs that teachers improve performance over time, particularly between the first and second year.
But there’s more. Rubinstein further found that teachers showed a wide degree of variance in performance in different classes in the same year. They also showed the same kind of variance in the same subject in the same year in different grades. The scatterplot above is a representative example.
Rubinstein writes:
Rather than report about these obvious ways to check how invalid these metrics are and how shameful it is that these scores have already been used in tenure decisions, or about how a similarly flawed formula will be used in the future to determine who to fire or who to give a bonus to, newspapers are treating these scores like they are meaningful. The New York Post searched for the teacher with the lowest score and wrote an article about ‘the worst teacher in the city’ with her picture attached. The New York Times must have felt they were taking the high-road when they did a similar thing but, instead, found the ‘best’ teachers based on these ratings.
I hope that these two experiments I ran, particularly the second one where many teachers got drastically different results teaching different grades of the same subject, will bring to life the realities of these horrible formulas. Though error rates have been reported, the absurdity of these results should help everyone understand that we need to spread the word since calculations like these will soon be used in nearly every state.
Since this appeared last week, others have picked up on the data. And Rubinstein had a follow-up yesterday showing that value-added performance data from charter schools shows that teachers are providing about the same value there as in all other public schools. Rubinstein writes that “the high correlation in this plot reveals that the primary factor in predicting the scores for a group of students in one year score is the scores of those same students in the previous year score.”
Lots of decisions in education are being made using this type of data. It’s used to single out “bad teachers” and argue for pay-for-performance models. Yet the data, to use a technical term, is dogshit. All it does is serve to humiliate teachers, and emphasize a flawed theory that “fixing schools” can be accomplished by throwing so-called “bad teachers” out of work.






16 Comments


Support this site!
Subscribe to the newsletter
Advertise on Firedoglake
Send
us your tips
Make us your homepage
About FDL News Desk
It is so obvious that parents need to set examples, encourage their children, be involved in the educational process, understand their children’s needs.
The saddest aspect of the conservative POV is the lack of awareness of human failings or inabilities.
Nine years ago I argued that this would be the result of “value-added” data — no value at all. Too many variables. Bad books, varied learning styles, illness, economic factors affecting families…
When you consider dynamics such as a flu virus going through the classroom as kids are being used to gather data or that schools rotate the demand of special needs students from year to year or that one year, a teacher may end up with a few more higher performing students…the variables are endless. My son was given a math test that had a question about creating a fraction with the number of boys vs girls listed in the story problem. Unfortunately, the problem used gender neutral names as well as names that my son could not determine the gender. That test scored him and four other math students, who had been gifted in math up to that point, as failing. A few poorly worded questions on a test required a math aide to come into the classroom. Just crazy.
If you cannot compare data in the context of independent, dependent, and controlled variables, how do you even begin to make a judgement or conclusion?
We need to convert all primary schools to the portfolio method of teaching. Each student judged by their own metrics based on where they are and who they are. It works. Grade inflation, legal requirements, union rules, political meddling with curriculum, tenure without competency requirements, mis or un informed helicopter parents having more input than the Principal have led us to this unrepairable system.
The 1%ers who want to privatize schools are NOT impressed by, ya know, well, um, actual evidence and analysis of it.
The scam of this “accountability” campaign is that school districts instead of paying teachers and improving schools are forced to pay testing firms of the educational-industrial-complex to create this bogus data.
Under the theory that taxpayers really don’t have to spend more money for schools or teacher salaries or effective administration (wait a minute have you seen school superintendent salaries recently?).
The public has been sold snake oil. And now the next step is to devalue education altogether as Santorum has been doing.
The lack of correlation reminds me of something I did in the mid-70s, when I was assigned to teach a required year course in statistics to undergraduates. I taught it two years. The first year I used an elementary textbook that was used in the business school, and I told the students that if they learned what was in it, they would pass the course (this was a big consideration, as it was required for the degree). I lectured at a much higher level, using the notes from my grad school econometrics course as the basis. The class was a huge success. The better students were excited by the more challenging material, and the average students were reassured by the fact that they knew if they learned the basics, they would pass.
I was so taken by the high average performance of the class that I decided to use a more advanced textbook the following year, in which some of the main theorems showed up as problems to be solved. Absolute disaster. The weaker and even the not-so-weak students were discouraged, and the forest was lost for all the trees. Terrible performance all round, including mine.
The point is that even with the best of will and ability, there is going to be a lot of variance at the level of individual teachers. A longer time sample will tend to identify interpersonal differences, but school administrators are like CEO’s: they only care about quarterly results.
It’s a profit center, doncha know. Monopoly profit for crap service. It’s the American way.
The scam is beginning to infect the Universities as well. We had McKinsey at our shop to see how we could do our job better (well not me, I’m out of it, but I keep up via friends in high places). As administration gets more and more ‘professionalized’ we get less and less understanding of the educational enterprise, and more on purely quantitative measures of god knows what. The other day I was checking out an obscure book at the library when a man came by to chat, and asked who I was. I said I was emeritus prof, and still working. We got into a conversation about the improvements, and I asked where all the reference shelves had gone now that the space is occupied by student lounge, computers, etc. And he said, isn’t that great? He turned out to be the library’s fund-raiser, and was so so pleased that students had a place to hang out. Who needs reference books in the Computer Age?
I remember the good old days when the library didn’t need a fund-raiser.
The EIC (as you put it) is a very powerful lobby. Testing is more important than teaching.
Heh. I have a painful story about teaching req stats course for undergrad econ majors in summer school. Never taught another course.
I just took my 30+ year old paper cover stats book to the burn pile. I instated a rule about 2-3 years ago that I couldn’t acquire a book without deacquisitioning one. Most of the ones I get rid of, I donate to the library fair, but I figured that no one would want the stats book and it gave me a great deal of emotional joy to see it burn.
I work at a community college, and can tell you from direct observation that the metrics used to measure success are absolutely bogus. The accreditation process is a Potemkin Village exercise in making up whatever the narcissistic committee wants to hear. Whenever you hear the buzz phrase “data-driven solutions” it’s time to pull on the hip waders.
here’s a case study in the damage value-added evaluation can do to a teacher and to a school and to a school system:
http://www.washingtonpost.com/local/education/creative–motivating-and-fired/2012/02/04/gIQAwzZpvR_story.html?hpid=z2
pay close attention to the evasive, legalistic responses of school system administrators to this young teaches pleas for a hearing.
if there is a group of professionals more meritorious of a guaranteed parking place in hell than our run-of-the-mill public school system administrators, college administrators, provosts, deans and dep’t heads,
i’d like to know who they are.
The 24% correlation on results between different grade is pretty bad. If you read the source article, there was also a same grade math and language graph where the results looked slightly better.
It should be emphasized that in any social science, correlations and R-squared metrics are terrible. A correlation about 50% would be damn near a miracle (in a simple one variable regression, a 50% correlation means one variable could explain 25% of the total variability in the other variable, i.e. R-squared). There is simply too much variation in life and results to be explained well by one or two metrics. All the graphs will continue to look like crap and have small correlations.
What I want to know is:
- How can we devise a workable system of metrics to measure both student learning and teacher performance relative to that learning?
As long as public schools are going to be funded by taxes, parents and tax payers wihout children are going to want some viable measure to show what they’re getting (in terms of an educated youth) for those dollars.
There has to be some way to measure this. Maybe the secret is testing across multiple subjects for multiple years. Somebody has to come up with the answer, for the good of the students, the teachers and their unions, the economy, and utimately the country.
BTW: Supposedly we come up short when our students are measured against those of other countries. How are students in those countries tested? And taught? How do we get our students up to the level of those international tests?
you can’t!!
trust the teachers to do their jobs, and require only that each school develop its own evaluation regimen.
get the state and the federal government entirely out of education evaluation.
the bureaucrats at the top of local, state, and national educational bureaucracies are extremely child-unfriendly and teacher-unfriendly.
their concern, as a group, seems to be for their status and their paycheck.
don’t waste your time fretting about international scores; that’s comparing apples and oranges.
furthermore, if you are really worried about getting your money’s worth for your tax dollars spent on education,
ask yourself if you have met a competent professional in any of a large number of fields whose recruits would have had to come from our system of training children and young adults.
if so, by what “unmetricated” miracle did these students prove competent as adults?