As Year 2 of the Merit-based Incentive Payment System wraps up, health policy watchers, payers, and clinicians want to know: Is the biggest pay-for-performance program ever attempted in the United States working?
CMS hasn’t let on. Administrator Seema Verma’s blog post earlier this year was skimpy on details, other than to say that 91% of the 419,000 clinicians who were required to participate in MIPS in Year 1 (2017) did. If that’s the criterion, you could argue that MIPS works, although the bar was low—clinicians had to report only one quality measure to avoid the scarlet letter of nonparticipation.
It may take a little while longer to ferret out whether MIPS is having CMS’s self-described effect—to “drive improvement in care processes and health outcomes, increase the use of health care information, and reduce the cost of care.” Already concluding that cost reduction is unlikely, Medpac recommended earlier this year to scrap MIPS altogether.
But if the track record on P4P tells us anything, it is that P4P (and probably MIPS) doesn’t work. A meta-analysis of 69 P4P programs, published last year in the Annals of Internal Medicine, found some weak evidence of process improvement but no improvements in outcomes. Lacking clear evidence of a benefit, the authors suggested that policy analysts consider the potential harms and unintended consequences of P4P. Like?
Well, teaching to the test, for one. In a survey of physician perspectives on MIPS, results of which were published in Health Affairs in July, Joshua Liao, MD, and colleagues found that 69% of respondents were concerned that MIPS would encourage physicians to “focus on aspects of care being measured to the detriment of other unmeasured aspects of care.” In other words, checking off boxes.
“There is an inherent pull in P4P toward things that are measurable,” says Liao, who is associate medical director for contracting and value-based care at UW Medicine in Seattle. “I can measure your blood pressure, and I know what that number is. By being doable and quantifiable, it’s much more likely to be in a program like [MIPS]. But what about things like shared goals around depression? Or prior assault?”
Avoiding complex patients is another possible unintended consequence. A 2017 analysis by Eric Roberts and his colleagues, also published in the Annals of Internal Medicine, found that the Value-Based Payment Modifier—a predecessor program to MIPS—penalized doctors who cared for the poor and very sick because their quality scores were lower. An accompanying editorial called the outcome a “reverse Robin Hood effect.”
We may be going down the same road with MIPS, if Liao’s study is any warning. Of those surveyed, 60% thought MIPS would lead physicians to “avoid sicker or medically complex patients to improve performance on quality or utilization measures.”
Sadly, that’s not theoretical. It happens. The classic case, says Liao, was detailed in Robert Kolker’s New York magazine exposé of cardiac surgeons in that state. After the state began making their mortality rates public, 79% of surgeons admitted to shunning riskier patients. Black patients were also singled out, receiving fewer CABG surgeries because of perceptions of risk.
Liao says that models like ACOs could be vulnerable to similar effects. “If we look at communities with high proportions of low-income or otherwise vulnerable patients—racial minorities, for example—unfortunately, some of the early descriptive, though not definitive, evidence is that [ACOs] form much less frequently in those areas.”
Though Liao calls the findings of his study “illuminating,” they didn’t surprise him. “There is no policy that’s ungamable in some way,” says Liao, who also is a senior fellow at the Leonard Davis Institute of Health Economics. “Every policy has unintended effects. So, the question wasn’t ‘is MIPS impenetrable to unintended consequences?’ It was rather, ‘how susceptible?’ And when you look at the numbers … that was more of a confirmation, and the magnitude was higher than I expected.”
In the end, MIPS may move the needle in a few areas, but whether it’s truly working will be judged by more than however CMS spins the data.