Case Studies That Flunk Every Plausibility Test Known to Mankind
Case Studies That Flunk Every Plausibility Test Known to Mankind
Population management claims: The Seven Rules of Plausibility provide means to test the claims of population management vendors. With case studies and commentary.
About this article
Al Lewis has been a thorn in the side of the disease management industry for years, and with his new book, Why Nobody Believes the Numbers, he may have reached his zenith. Written with Al’s down-to-earth acerbic wit and mathematical savvy, this excerpt from chapter 3 will amuse, educate, and/or enrage you.
Who is Al Lewis? He has a bachelor’s in economics and a law degree from Harvard (where he taught economics); has held management positions at Interqual and Bain Capital; and was an analyst at Lehman Brothers. See his LinkedIn page for many more credits: http://tinyurl.com/al-lewis-dm
Among those who have praised this book, which was published last month, are the Brandeis University economist Stuart Altman; Regina Herzlinger, professor of business administration at Harvard Business School, and Tom Scully, former administrator of the Centers for Medicare and Medicaid Services.
This excerpt printed with the permission of John Wiley and Sons Inc. Copyright ©2012 by Al Lewis.
It has been shortened and modified for continuity.
Everything in life has an “80–20” rule. Example: 20 percent of the population accounts for more than 80 percent of income; 80 percent of a ball club’s salary goes to 20 percent of its players, and so forth. The 80–20 rule is everywhere.
In population health spending, the 80–20 rule is that 80 percent of the time, there is no 80–20 rule. For instance, the Centers for Disease Control claims that the 50 percent of adults who have chronic disease account for 75 percent of health care spending. A 75–50 rule is about as far from an 80–20 rule as you can get, and means that costs are diffused throughout the system, rather than concentrated. (It is also not the slightest bit clear how they can define “chronic disease” so broadly that fully 50 percent of us have it. Are they including insomnia? Tooth decay? Dandruff? Ring around the collar? And how do they even know I suffer from these afflictions, let alone how much I spend on white noise machines, toothpaste, or Head and Shoulders? But we shall leave both these questions and Those Dirty Rings for another day.)
Consistent with that observation about unconcentrated costs, it turns out that large chunks of potential savings are not sitting in one place just waiting to be harvested by a vendor imploring people to smoke fewer Marlboros and eat more broccoli. Yes, the lesson will be: A simple, usually voluntary, program isn’t going to make a noticeable dent in health spending.
But try explaining this to the population health improvement industry (PHI), which prides itself on saying they do exactly that. Fortunately a modicum of math and critical thinking, using one or more of seven informal, common-sense rules can help determine whether this pride is justified. These rules are not footnoted or otherwise sourced, because there is no precedent and no governing body for validating PHI outcomes.
Or, to quote the immortal words of the great philosopher Groucho Marx: “Who are you gonna believe, me or your own eyes?”
The lesson from this chapter will be: A simple, usually voluntary, program isn’t going to make a noticeable dent in health spending.
The goal of these common-sense rules is not to validate every study that is truly valid, which would be a Herculean task, but rather to invalidate those claims that are obviously invalid, a first level of intellectual triage to avoid making misguided resource allocation decisions. The plausibility rules are as follows, with their shorthand in boldface:
The 100 Percent Rule: Outcomes explicitly or implicitly cannot require any element of cost to decline by more than 100 percent.
The Every Metric Can’t Improve Rule: Every element of resource use or group of people cannot decline in cost, through programs aimed generally at improving prevention. In particular, the actual costs associated with prevention, such as primary care visits, drug use, and health screening, must rise.
The 50 Percent Savings Rule: In a voluntary program with no incentives, declines in excess of 50 percent in any given resource category are the result of invalidity, not effectiveness.
The Nexus Rule: There must be a logical link between the goal of the program and the source of savings.
The Quality Dose–Cost Response Rule: Just as in pharmacology, cost cannot decline significantly faster or more than the related quality variables improve.
The Control Group Equivalency Rule: Control groups, if not prospectively sorted into two similar or equivalent groups, based on objective data, before members are even contacted to determine willingness to enroll, are likely to mislead. This is especially true of historic controls (meaning pre-post), matched controls, and using the non-disease group as a control for the disease group.
The Multiple Violations Rule: When one of these rules is violated, others are likely to be violated, as well.
There is a concept in testing called “face validity,” meaning what you’d expect it to mean: A study has face validity if it looks like it’s fairly measuring what it’s supposed to measure. These plausibility tests introduce a companion measure: “face impossibility.” An example has face impossibility if rather than challenge the data or the study design to question an outcome, you can simply tell from the numbers themselves as presented by the vendor that the outcome is impossible.
Every example here has face impossibility.
The textbook example of face impossibility is violating this plausibility rule: You cannot reduce a number by more than 100 percent, period. This is true no matter how hard you try. And just to avoid any potential misunderstandings by our readers Down Under, this also isn’t one of those things that’s the opposite in the Southern hemisphere.
Give it a shot yourself if you don’t believe me. A guy with two PhDs tried and even he couldn’t do it. He posted online — for the world to see if the world didn’t have better things to do with its time — the following: “Suppose you buy a stock at $10. It goes up to $50 and then down to $5. You’ve lost 450 percent.” Nope. Your stock has gone from $10 to $5, a fifth-grade textbook case of a 50 percent decline.
The 100 percent rule is a rule of math, and as mentioned earlier, rules of math are strictly enforced. That means the web page is wrong.
It’s lucky math isn’t a popularity contest because these guys aren’t the only ones who think you can reduce a number by more than 100 percent, as the conclusion of a case study from Vendor D suggests: “Wellness program participants are 225% less likely [boldface is actually theirs, believe it or not] to utilize Extended Illness Benefit than nonparticipants.” Note that for copyright reasons (even though this brochure wasn’t copyrighted) both the hospital’s name and the percentage reduction were changed. We did them a favor not just on the former but also on the latter, because the actual number they claimed was even higher.
Maybe “225 percent” wasn’t enough to excite their customers, because the Vendor D website now proclaims “390 percent’.’
It’s hard to tell which makes less sense: the numbers or the words. “390 percent” compared to what? There is also a misplaced modifier issue, as in “crossing the street, the bus hit me.” Or, perhaps they intended it to read this way. Perhaps the “400 percent losses” apply only to “employers associated with chronic disease,” such as Merck, Pfizer, and maybe Healthways or Alere. Presumably, being in the chronic disease business, they can make up their mathematically impossible losses in volume. The good news is that NASA employees don’t need to worry about their job security, because these people are obviously not rocket scientists.
It’s lucky arithmetic is not a popularity contest because here is another vendor whose outcomes break the mathematical impossibility barrier
Wellness program case study: St. Mary’s Hospital
St. Mary’s Hospital started their [sic] first comprehensive wellness program in 2006, implementing a personalized approach focused around a high trust, high engagement strategy with [Vendor D]. The following provides data directly from that program.
Like most organizations, hours tied to sick time are categorized as Extended Illness Benefit (EIB). Anomalies such as maternal leave were pulled out, leaving 96 percent of the population for the analysis. The result was that wellness program participants are 225 percent less likely to utilize EIB than non-participants.
However, even highly respected organizations can trip over the 100 percent rule. Here is a press release citing the Institute for Healthcare Improvement (IHI). The consensus would be that IHI is one of the most capable and influential organizations in the field. And yet…
PCMH effectiveness: The proof is in
HI-WIRE George Miller January 04, 2010
A five-year prospective evaluation of the model yields a 129 percent increase in patients receiving optimal diabetes care and a 48 percent increase for heart-disease patients. The model also achieved a 350 percent reduction in appointment waiting time, as reported by the Institute for Healthcare Improvement.
More common violations of the 100 percent rule are not as flagrant. As a reader of these reports you can’t assume that your vendors will simply announce that they are violating basic rules of fifth-grade arithmetic. You will have to infer it.
The Center for Health Value Innovation
The Center for Health Value Innovation (CHVI) has a vision statement that says, “[CHVI] will be the trusted resource to demonstrate how engagement in health can improve accountability and economic performance.” In one of their presentations they showed a savings of $5,000 per person per year (net savings, meaning after fees are subtracted) generated by a care management program for commercially insured members, where this number was said to be for the “average” person. However, the average commercially insured person doesn’t even incur $5,000/year in paid claims — and certainly not in claims that could be considered even theoretically avoidable — making it impossible to reduce claims by this amount, especially net of fees — a clear violation of the 100 percent rule.
An example like this demonstrates the need for more instruction in the health outcomes numeracy field, both in general and also specifically because the CHVI, which itself provides instruction in outcomes-based contracting, was unable to recognize it is not possible to save $5,000/year/person in a commercial population.
The Why Nobody Believes mantra: If you insulate your house, you should save money overall, but you won’t save money on insulation.
Likewise, in health care you need to spend more money in some areas to save money overall. So, for instance, unless you believe it’s possible to talk people out of taking their drugs and have their inpatient utilization decline nonetheless, this Health Plan C slide has face impossibility (not to mention that the quantities in the first two columns don’t sum to the quantity in the last column.
The result could also have been caused by comparing people who volunteered to participate in the program to those who didn’t participate — a classic fallacy. We will bring it up a bunch more times before the book is done.
Years of doing valid outcomes measurement have confirmed the obvious: You can’t “move the needle” a lot without strong financial incentives. Want people to stay out of the ER? Sure, you can entice doctors to keep longer hours by paying them more, and that should reduce ER usage a little bit. Double the ER co-pay, though, and watch ER visits decline.
You especially can’t move the needle on chronic disease events, because most adverse events simply aren’t preventable with a few outbound phone calls.
And in reality, the needle-moving impossibility threshold, using programs without strong economic incentives/disincentives, is more like 20 percent. I chose 50 percent because there are so many outcomes studies showing greater improvements than that.
State agencies routinely accept outcomes that violate one or more of the plausibility rules, as we will see in-depth in the next chapter, and again in Chapter 8. Here is an example from Georgia Medicaid, prepared by Benefits Consulting Firm A. A word-for-word reconstruction of the summary page of their report is shown below:
Georgia Medicaid report
- In Region 1, Vendor E generated net savings, after subtracting their contracted fees, of 19.0 percent. These savings exceeded the contractually guaranteed net savings of 4.55 percent. No penalty should therefore be assessed for financial results.
- In Region 2, Vendor F generated net savings, after subtracting their contracted fees of 15.8 percent. These savings exceeded the contractually guaranteed net savings of 4.22 percent. No penalty should therefore be assessed for financial results.
The 50 percent savings rule would guide readers to look at Region 1’s 19 percent overall decrease. True, the 50 percent savings rule focuses on declines of 50 percent or more, but that’s 50 percent in any single category. A 19 percent overall decrease can — and will — easily be shown to require a >50 percent decline in hospitalizations, since disease management generates savings almost exclusively in hospital costs and ER costs, the latter being quite trivial, though, as compared to the former. Because the idea of disease management is to provide enough preventive services and self-care to avoid hospitalizations, typically the cost of non-hospital services increases in successful programs.
Let us, however, generously assume away any likely increase in non-hospital costs and say that the hospitalization reduction was achieved without increasing prevention-oriented costs. Next, let us add back in the actual fees, approximately $9 per member per month or roughly 2 percent, making the gross savings before fees 21 percent (19 percent + 2 percent).
Let us also make some assumptions for program outreach and intervention effectiveness that are generous to the program in that they exceed, in most cases by a lot, what most programs achieve:
- Hospital costs account for 50 percent of total costs in the Medicaid disabled population.
- 50 percent of hospitalizations are avoidable through phone calls.
- 50 percent of people are engaged by the program.
- 50 percent of those engaged are (without financial incentives, which were not provided) successful enough in losing weight, giving up cigarettes, and taking other steps so that they do indeed avoid hospitalizations.
We can build these assumptions into a table to determine how many hospitalizations would need to be avoided in the last bullet-pointed group — the sub-category in which the program was effective — in order to save 21 percent gross, meaning 19 percent net plus the 2 percent fees.
|Category||% of Total||Reduction in costs needed to get 21% overall gross savings|
|Hospital costs||50% of costs are hospital costs||42% of total hospital costs must be avoided|
|Avoidable hospital hosts||50% of hospital costs are avoidable through telephone disease management||84% of avoidable costs must be avoided|
|Engagement rate||50% of members are engaged||168% of total hospital costs must be avoided in the engaged population|
|Success rate||50% of engaged members are successful in avoiding avoidable hospitalizations||336% of avoidable hospitalizations must be avoided in the engaged population|
Obviously, despite generous assumptions for contact and success rates in disease management, the 19 percent net savings result that the state of Georgia accepted is so obviously a violation of the 50 percent savings rule that some might question whether the state’s administrators at the time accepted the findings not because they believed them but rather because the results justified further federally matched spending on the program.
Postscript: Vendor E, having grossly underbid the project, was later found to have made almost no outgoing phone calls to beneficiaries, and consequently agreed to return money to the state. So, Benefits Consulting Firm A was able to find mathematically impossible savings for the state despite the fact that the vendor allegedly generating those savings acknowledged not doing anything.
What list of states lying about finances would be complete without Illinois? Here is their press release, which claims more savings through disease management than the state actually spent on chronic disease events, a 100 percent rule and a poster child for face impossibility. (Oh, yes, and in actuality their chronic disease events did not even decline enough to pay for the program, a minor detail.) But some other bigger news at the time about that state’s governor relegated this news to the inside pages, sort of like Michael Jackson’s death did to Farrah Fawcett’s.
Listen carefully once again: You can only achieve savings in the categories in which you are trying to achieve savings. If costs decline in any other category, it had nothing to do with you. We will see two examples in our detailed case studies of this, but for now, consider this slide. We can’t say the name of — or even assign a code name to and then charge you for revealing the name of — the vendor shown in Figure 2 because this slide wasn’t published, but that doesn’t make it any less amusing.
It’s not just that everything declines. It’s that the biggest declines are in the two largely preventive categories (MD visits and drugs) where one would expect an increase — exactly contradicting the tenets of care management. Yes, once again that goes back to the observation that even if insulating your house saves money, the cost of the insulation itself doesn’t decline.
Perhaps Ned Flanders would be okay with this type of internal inconsistency because he believes everything in the Bible, including the “stuff that contradicts the other stuff” but obviously no one else would, right? Wrong. For three years these guys presented this material without anyone other than me noticing. To their credit, they did change their methodologies after I suggested doing so for the third time.
There is no way that events can decline if you don’t improve quality. If they do, that’s face impossibility. Usually the changes in quality variables are a smoking gun that invalidates the entire cost savings claim, as in Table 2.
|TABLE 2 How trivial quality improvements generate massive reductions in hospitalizations|
|% Cardiac Members||Base||Contract year 1||Improvement|
|With an LDL screen||75.0%||77.0%||2.0%|
|With at least one claim for a statin||69.0%||70.5%||1.5%|
|Receiving an ACE inhibitor or alternative||43.5%||44.7%||1.2%|
|Post-MI with at least one claim for a beta-blocker||0.89||0.89||0.0%|
|Hospitalizations per 1,000 cardiac members for a primary diagnosis of myocardial infarction||47.60||24.38||−48.8%|
Along with a lack of understanding of significant digits, percentages versus decimals, and changes in percentage versus changes in percentage points, this example clearly shows what is sometimes informally referred to in population health improvement as the “wishful thinking multiplier”:
% Event or cost reduction / % Improvement in quality indicators
Or, in wellness:
% Cost reduction / % Risk factor reduction
In this example, the wishful thinking multiplier is about 40, meaning that events fell about 40 times faster than the average of those four quality variables improved. The real wishful thinking multiplier, as we will see when we review the valid literature and review “mediation analysis” that connects the two, is only slightly greater than 0 for the first two to three years of a program.
Even the denominator itself can be gamed. Let’s start with quality indicators. Several vendors are partial to bragging that “10 of the 15 quality indicators either improved or stayed the same.” That means five deteriorated. If, of the 10 that improved or stayed the same, half stayed the same, as is often the case, no quality improvement took place. Five indicators got better and five got worse.
One of the vendor community’s favorite tools involving quality indicators is a “gaps in care” report, like this one in Figure 3.
The vendor reports that 43 percent of open care gaps were closed, while only 19 percent of closed care gaps opened up. Big success, right? Look again. This time, focus on the absolute number of gaps that changed over the course of the year. Turns out, there was virtually no change in open care gaps.
The wellness equivalent of quality indicator improvement, risk factor reduction, is equally if not more suspect and will be addressed in the wellness vignettes in the next chapter. It turns out that alleged risk factor reduction is often the result of fallacious measurement rather than actual impact. For instance, many wellness vendors measure only the engaged (participating) members, meaning the ones most likely to comply. We see this particular fallacy about once every 10 pages. And often vendors measure only the people who showed up in the baseline, against themselves a year later. That “historic control” fallacy is described in the next section.
One reason that there is a rule covering multiple violations is that you tend not to get impossible results without making myriad mistakes along the way. Some footage from the highlights reel:
Historic controls — meaning the same population before and after — creates a fallacy where people who were high-cost enough to make it into your “before” population will as a group regress to the mean, but formerly low-cost people not in the “before” population who regress upwards during the “after” period will be excluded from this analysis.
Matched controls — volunteers are compared to non-volunteers with similar claims and demographics — fail to control for motivation, which is the key to successful self-management of a disease.
Using the non-disease group as a control will overstate savings because people who don’t generate disease-specific claims because they are mildly chronically ill will often slip into the control group, and then explode in costs as their disease progresses, thus inflating the trend line.
One article traced what happened when you simply tracked the costs of people who were identified in the baseline year absent a program — a historic control. (Note that the study used a very tight algorithm to identify cardiac patients — a cardiac event in the baseline year.
Consider perhaps the best pure example of failure to control for motivation by using matched controls, taken verbatim from the white paper downloadable from the website of Vendor G:
- “We utilize an opt-in enrollment model to target those individuals who have high health confidence and the highest motivation to change their health situation,” and so as not to leave anything to chance, Vendor G then…
- “ ... provides incentives to participants in our Condition Management Programs.”
Farther down on their website, they note that they produce “valuable disease management reports” that “provide you with ROI.”
How do these “valuable disease management reports” determine an ROI? To what control group do they compare motivated, incentivized volunteers? They “match members in the measurement year with non-participating members with similar clinical, utilization, and cost characteristics.” They do precisely what a biostatistician or health services researcher would never do: They find (1) motivated volunteers, (2) bribe them to participate, and then (3) compare the results to people who lacked enough motivation to participate and were not given incentives.
“This approach is used because there is a need to compare the program participants to something [emphasis theirs] in order to judge whether there have been improvements.” In other words, they prefer to offer an obviously invalid ROI analysis than none at all. This is presumably because their customers, egged on by their consultants, demand to know: “What’s my ROI?” Yep, they want a number, notwithstanding that it is meaningless. Vendor G, to its credit, basically acknowledges online that this measurement is meaningless, and provides it only because their customers are insisting on it.
But we have run out of space for this excerpt.
The rest of the chapter, and all the other chapters, of course, are in the book.