(Bloomberg View) — The Obama administration has announced plans to tie 90 percent of all Medicare fee-for-service payments to some sort of quality or value measure by 2018. Sounds exciting! Who wouldn’t like to ensure that their doctors are paid for delivering value, rather than just randomly sticking needles into us?
Unfortunately, as both the Official Blog Spouse and Aaron Carroll of the Incidental Economist have noted, there is less to this announcement than meets the eye. Saying you want to pay for quality instead of procedures is quite easy to say; indeed, many an administration has said so, because “paying for outcomes instead of treatment” is the holy grail of health-care economists everywhere.
But actually doing this, rather than just saying it, turns out to be really hard.
I think it’s fair to say that the Official Blog Spouse is one of the few journalists in the nation who has extensively reported on the history of Medicare payment reforms, all of which were supposed to move the system toward paying for valuable health care rather than cardiologists’ greens fees. As he details, they mostly failed. Medicare payments turn out to be a lot like one of those gel stress balls: You can squeeze them very small in one place, but the spending just pops out somewhere else.
There are a lot of reasons for this. Health-care lobbies are powerful, and Congress is almost uniquely easy to lobby, so ideas like controlling the growth rate of physician payments fell by the wayside once those payments actually had to be cut. The larger problem, however, is finding what to measure — and making sure that your measurement doesn’t introduce perverse incentives into the system. The fundamental problem is that while we want to pay for “health” or “outcomes,” we can’t really measure those very well.
Here’s a little exercise that will illustrate the problems of measurement that confound attempts to pay for “outcomes” or “health” instead of treatment: Tell me how healthy you are on a scale of 1 to 10.
Now before you blurt out an answer, stop and think. You’re probably already pondering some questions: What’s on the scale? What does a 1 look like, and what is a 10?
Let’s say that 1 is a terminal cancer patient in the ICU; 10 is an 18-year-old athlete in the prime of his physical powers. But you’re probably neither of these things. So where do you fall in between? Maybe you’re pretty healthy for a 47-year-old accountant, but your back gives you frequent trouble and you’ve got some acid reflux you need to watch, and, of course, there’s your blood pressure pills, or maybe in your case it’s a statin …
If you rate yourself compared to your neighbors, or other 47-year-old accountants, you might give yourself an 8 — 9 if you’re the cheery sort, 7 if you’re a perpetual grump. But if you compare yourself to that 18-year-old athlete, you’re probably more of a 5 or a 6.
And that’s only the stuff you know about. What about the stuff you don’t know about? How likely are you to die in the next five years? Or have a heart attack or a stroke or lose a limb?
The answer is “you have no idea.” If we had 50,000 of you, actuaries could predict these things pretty accurately: how many heart attacks, strokes, deaths, car accidents and so forth. But unless you are that terminal cancer patient in the ICU, no one can predict how likely you, personally, are to die in the next five years. We can say something about the expected life and health of large groups of people very like you. But not you personally.
Unfortunately, doctors don’t treat statistical universes; they treat individual patients. Those patients may unpredictably die, or just as unpredictably survive against incredible odds. Some of that is due to the skill of the doctor, some to the innate characteristics of the patient. How much of which? Hard to tell unless the doctor does something obviously completely wrong and stupid, like leaving an instrument inside the patient he’s operating on.
You can look at the whole pool of patients that the doctor treats, of course, but the more complicated and expensive the treatment, the fewer patients the doctor will be treating, which means that your data is prone to being swamped by a few outliers. Moreover, doctors do not treat identical patient pools. A good doctor who treats really sick patients may look worse than a bad doctor who confines their treatment to the relatively young and healthy.
See also: A Philadelphia hospital makes a bet on PPACA.
Of course, we can attempt to correct for this by adjusting the measurement for risk. The problem is that we don’t know all the risk factors; we know some risk factors that we can measure. There are a lot of risk factors we can’t, which means that this adjustment will be far from perfect.
If the adjustment is too imperfect, providers have recourse even beyond lobbying: They can stop taking patients covered by your program. That limits your ability to shrug your shoulders and say, “Gosh, well, the world’s imperfect, so I’m afraid that yes, some of you are going to get unfairly penalized under the new system. It’s the best we can do.”
Medical systems are not the only systems that encounter these problems. Just ask any organization that has tried to implement a new sales compensation scheme to better align sales incentives with “customer value.” As one veteran of such attempts told me, suddenly salesmen who majored in beer pong are “like Aristotle” — they can explain exactly why their sales territory is special and your new, complicated system fundamentally mismeasures the value of their efforts. Within six months, you’ll have lost a few top performers who hate the new system. Within a year, your burgeoning philosophers have probably figured out how to game the new metrics.
Gaming — “juking the stats,” as it was called on “The Wire” — is the other major reason that these sorts of systems are hard to implement. Let me illustrate with a little example. The town of Beachy Head, England, had a big problem with suicide; people threw themselves off its dramatic cliffs. In 1975, however, it managed to cut the rate of suicide in half in a single year. An improvement in the national mood? Or a dramatic triumph of public policy?
A new medical examiner. The new chap decided to test the blood alcohol level of bodies found at the base of the cliffs. Those with alcohol in their blood were ruled to be accidents, rather than suicides.
You might argue that people bent on suicide could be taking a drink to fortify their courage before attempting to take their own lives — and you’d probably be right. Which is exactly the point. There is some true rate of suicides at Beachy Head, but that’s not information we have. All we know is the suicide rate, which is dependent on things like the assumptions of the medical examiner.
This is always a big problem, but it is particularly problematic when you give the person taking the measurements strong incentives to see things one way, rather than the other. On “The Wire,” cops made their crime rate look good by reclassifying serious crimes as less serious, or as accidents, which did nothing about the underlying problem but made the cops look much better. Unfortunately, we see the same behavior in doctors and hospitals. It’s called “upcoding”: rating conditions as more serious than they are in order to increase the reimbursement, or to improve their performance on those risk-adjusted mortality measures.
This can go beyond just massaging a few figures and do active harm. For example, consider what happened when New York state started measuring cardiology outcomes.
The idea was that they were “ending years of private, clubby surgeon culture.” The public report cards “were intended to shine a light on poor surgeons and encourage a kind of best-practices ethic across the state. If the system worked, mortality rates would fall everywhere from Oswego to NYU.” And at first glance, the system worked beautifully: Risk-adjusted mortality rates dropped by an astonishing two-thirds. But as New York magazine reports, it rapidly became clear that one way surgeons were achieving these advances was simply by refusing to treat the sickest patients:
This isn’t just about high-risk patients. It’s about doctors playing games with practically any patient to get better scores. Some surgeons look for ways to make their easy cases seem harder. Others make their hard cases appear so difficult that they place out of the state reporting system. When it comes to the sickest patients, some surgeons simply turn them away, asserting that they’re better off getting drug treatments, or waiting in the ICU.
“The cardiac surgeons refer their patients to the cardiologists, and the cardiologists refer them to the intensive-care unit,” says Joshua Burack, a SUNY–Downstate surgeon in Brooklyn who in 1999 released a study revealing that nearly two-thirds of all heart-bypass surgeons in the state anonymously admitted to refusing at least one patient for fear of tainting their mortality rates. “Everyone’s going to pass along the hot potato to the person who’s not vulnerable to reporting.”
In the past five years, no fewer than five studies have been published in reputable journals raising the possibility that New York heart surgeons are not operating on certain cases for fear of spoiling their mortality rates. The clincher came in January, when, in an anonymous survey sent out to every doctor who does angioplasty in the state, an astonishing 79 percent of the responders agreed that the public mortality statistics have discouraged them from taking on a risky patient. If you’re a hard case, in other words, four out of five doctors would think twice before operating on you.
The Cleveland Clinic started getting a lot more referrals from New York — and their patients were sicker than the patients referred from other states.
Now, you can make an argument that maybe this is all to the good — that maybe the money we spent doing heart surgery on very sick people was wasted, and it’s better to concentrate our money on the relatively healthy. But that’s not the purpose of the report cards, which are supposed to help patients make informed choices about their surgeons — not to help surgeons better choose their patients. The doctor profiled in the article, who had New York’s lowest cardiac mortality rate at the time, told the reporter that he achieved that rate by not operating on people who were “already dead.”
But what does that mean? Refusing to operate on hopeless cases, or refusing to operate on people who have a 40 percent chance of living with surgery and no chance at all without it? If that were me, I’d probably want to gamble — and I’d probably be pretty angry if surgeons were too afraid that a failure would show up on their report card.
In some cases, surgeons code their patients as sicker than they used to, even if doing so means doing additional, unnecessary treatment. This can range from putting a patient on nitroglycerin to, the article alleges, actually putting a little ring around someone’s mitral valve, which the surgeon who recalled the incident describes as “assault.” These measures either improve the risk adjustment or take the patient out of the report card sample entirely, because they’re deemed special cases.
You get the point: A measure that was supposed to make patients healthier and encourage the spread of best practices has instead kept doctors from treating sick patients and encouraged unnecessary treatments. Don’t get me wrong: It may well have encouraged some better treatment, too. But we always need to be mindful of the perverse incentives by which even a simple, obvious solution like “more transparency!” could actually make the system worse.
More broadly, when money is on the line, assume that people will act against any system you come up with to preserve their income, even to the detriment of patients — like Medicare’s plan to reduce hospital readmission rates, which completely succeeded in reducing those numbers and also apparently resulted in a lot more patients being put on observation status rather than being admitted to the hospital. That meant they didn’t count as “readmissions” if they came back. It also potentially left the patients on the hook for bigger bills.
I’m not saying that no payment reform program can ever work. I am saying that most of the significant attempts to reform the way we pay for health care haven’t, and for similar reasons. Reformers have the basic idea right: You’ll get more of what you’ll pay for, and less of what you don’t, so you should pay for what you want. Unfortunately, in fields like health care and education, we can’t pay for what we want; we can only pay for what we can measure. And it’s usually a lot easier for people to play with the measurements than it is to change their behavior or give up a big hunk of income.