Show Me The Outcomes!

Valid outcomes studies in disease management are elusive. Which are good? Which are not? How do you evaluate what a vendor has to offer?

Out·come (out´·kum), n. 1. A final product or end result; consequence. 2. A conclusion reached through a process of logical thinking.
(Random House Unabridged Dictionary of the English Language, second edition, 1987.)

If you go by the first definition, you’d have to agree that the slew of data touted by the disease management industry to sell itself are just what it says they are: outcomes. Identify your subjects, develop a baseline, intervene, and show your results.

“You can report any outcome you want, but the question is, ‘What’s your question?'” says Neal Friedman, M.D., medical director for Albuquerque, N.M.-based Lovelace Healthcare Innovations. “None of those answers are inherently wrong.”

It’s the second definition that ignites debate. There’s little agreement among the 170 U.S. DM companies about the processes used to reach conclusions. Outcomes are many, but results that knowledgeable buyers of DM agree are valid are few.

“As an industry, we’re not counting everyone the same way and we’re not analyzing the same way,” says Bob Stone, executive vice president for American Healthways (based in Nashville, Tenn., and known until last month as Diabetes Treatment Centers of America, or DTCA). “I’m reminded of when New York raised its sales tax from 8 to 9 percent. The debate was, ‘Is it a 1-percent increase, or a 12-percent increase?’ That’s what everyone faces when they use outcomes to evaluate what they buy.”

Eventually, meaningful outcomes will define the value that DM brings to medicine. From Benjamin Rush’s bleed-’em-healthy philosophy of the 1700s, to neurology’s “electric baths” of a century ago, to the all-but-extinct staff-model HMO, the history of medicine is full of well-intended false starts. Whether DM fades as another failed notion may depend on how quickly the industry can produce respectable outcomes on a consistent basis.

The “good” studies

A roster of universally accepted outcomes is short. One study on almost everyone’s list was published by DTCA in the Journal of Clinical Endocrinology and Metabolism in 1998. A retrospective analysis of 7,000 patients in DTCA’s NetCare program found a $50 per-member, per-month savings in diabetes treatment costs over 12 months. Admissions dropped 18 percent; such interventions as eye and foot exams and hemoglobin A1c tests increased.

Some respectability stems from the fact that the study examined patients in seven health plans. “The DTCA article is an example of what works,” says Al Lewis, who, as president of the Disease Management Purchasing Consortium, makes a living picking such things apart. “It’s peer reviewed, and the design is as valid as it gets.”

Outcomes from Lovelace’s disease-specific “episode of care” teams are also often mentioned among the “good ones.” Its pediatric asthma team, for instance, documented a two-year, 40-percent reduction in admissions among 4,000 patients, while the low back pain team reported a two-year, 35-percent drop in spine surgery. Two-year data are rare in DM.

Their context adds credibility. Working within Lovelace’s integrated health system, the teams follow patients through varying levels of care, assisting physicians in determining the best treatment. “It’s part of a health care re-engineering effort, rather than creating partitions within a health care system,” says Friedman.

Beyond these, lists become more subjective. Only a handful of peer-reviewed outcomes studies have been published. Humana reported a 60-percent drop in admissions for CHF in Disease Management in 1998. Blue Cross of California touted big improvements in HEDIS screenings for diabetic people in Diabetes Care last year.

The dearth of peer-reviewed literature may be as much about business as Swiss-cheese science. “Some companies that might have data with statistical or scientific validity won’t publish it, because they think it gives them a commercial advantage,” says Jim Bonnette, M.D., chief medical officer for Nashville-based Health Connections Inc., a DM company. “Why give away your secrets?”

Ah, peer review isn’t everything, says Lewis. “It’s my experience that peer review is not 100 percent correlated with valid study design. Likewise, there are plenty of valid studies that aren’t peer reviewed.”

Stone ticks off his own mental list: “Some asthma studies from drug companies are good, but I haven’t seen anything good on comprehensive respiratory care. There’s nothing comprehensive on total cardiac disease; some CHF analyses are good, but you have to dive behind them to see how big the population is and how the base period is established.”

The future holds promise. The U.S. Agency for Health Care Research and Quality put up $2 million to fund the first randomized controlled trial, comparing outcomes for Medicaid patients in asthma DM programs against those outside of DM care. The University of Pennsylvania’s Alan Hillman, M.D., M.B.A, is the lead investigator for the study, which will follow patients for three years.

Not proven

Until then, the onus is on the DM industry. “We all have the gut feeling that DM works, or we wouldn’t be in the field,” says Bonnette. “But I don’t think we’ve proven it yet.”

What most DM vendors report are subsets of information, typically, how small groups of acute patients benefited from short-term interventions. Not much science in that, bemoans Bonnette: “Unfortunately, most of the value of that is in marketing.”

Stone, at American Healthways, concurs. “We see a lot of studies with very small sample sizes. If I’m a medical director or a CEO at a health plan, I’m not sure I’d have a high degree of confidence in them, particularly for a concept that is still relatively new.”

Bernard Mansheim, M.D., chief medical officer for Coventry Health Care, suggests that nothing out there — regardless of external validation — is an outcome in the truest sense. “Any outcome is going to be a long-term measurement. The ravages of diabetes — nephropathy, heart disease, retinopathy — occur over 10 to 30 years. Whether you prevented those things will take years to determine.”

Consequently, what’s touted are financial outcomes, based largely on cost avoidance. “That’s a leap of faith,” says Mansheim. “You can calculate savings of members who didn’t end up in the hospital, but it’s awfully difficult to prove they would have ended up there without the intervention.”

In the absence of long-term outcomes, financial projections are most DM programs’ best marketing tools. “Potential customers talk quality and measurement, but in the long run, they’re not going to buy a system unless it helps to reduce their costs,” says Iver Juster, M.D., a medical informaticist and chief medical officer for Palatine, Ill.-based Heads Up Population Health Management Services.

Four kinds of outcomes are generally offered now: clinical, functional status, satisfaction, and financial. Considering that DM-outcomes reporting is in the “first or second generation,” Juster says, most of what’s offered today “isn’t that bad.

“But agreement on standards and making them usable is more important than spending 12 months revising some subtlety,” he says. “Before inventing third-generation stuff, we need to ensure people do things consistently so we can make comparisons.”

Bonnette agrees. “Unless we agree on a format for reporting and say, ‘This is not scientific, it’s just a short-term interim report,’ then our press releases hurt us. Who knows whether to believe them?”

Garbage in, garbage out

Some movement toward standardization is under way. Juster cites the work of John Ware, Ph.D., of New England Medical Center, who developed the SF-36, a standardized questionnaire with validated scales for measuring health status from a patient’s point of view. Using this, results of various types of interventions can be compared.

Industry leaders are taking steps, too. Together, Lovelace and Hartford Hospital are developing criteria people can use to evaluate outcomes. American Healthways produced standards for DM programs (to be released early this year), with the goal of helping buyers evaluate DM vendors’ proposals.

A top consideration for any credible outcomes report, says Stone, is a fixed period, e.g., a calendar year, for comparisons. In some studies, a baseline for a given patient’s health costs is calculated on a time frame that ends with a hospitalization. Almost by default, then, future savings are guaranteed.

“Most people with chronic disease who have an acute episode tend not to have another in a 12-month period, even without intervention,” says Stone. “If you load the severity into the base period, your 12-month outcomes aren’t very impressive.”

Another mark of credibility, many agree, is large sample size. “Saving 53 percent on $30,000 of expenditures for 60 people is impressive,” Stone says. “Saving 13 percent of $5,000 for each of 40,000 people is a lot more impressive, in terms of actual American green stuff you get to take to the bank.”

Short-term return on investment doesn’t mean much, he thinks. “It doesn’t make sense to compare a CHF program with a 53-percent savings rate to a diabetes program with a first-year, 13-percent savings rate. The value in diabetes is savings over time.”

Beware outcomes surveys that tout satisfaction measures, warns Bonnette. “It’s like asking, ‘How much do you love me — a whole big bunch, or not quite as big a bunch?’ If you phrase the question right, you get the answer you want. I’d be more interested in clinical outcomes over a long enough period to see if it makes a difference, and whether we are saving money by altering the course of a disease.”

Friedman takes a liberal view. Offering that the Disease Management Association of America last October defined DM as something with many specific components, no single methodology is “right.”

“DMAA is trying to set a format where someone can say, ‘I am a health plan that fits X and Y specs. I’ve contracted for DM with company Z that provides this product, so according to DMAA, these are some of the outcomes I can report,'” says Friedman.

Perhaps the chief criterion, many players agree, is that outcomes reflect study of an entire population — whether that is defined as everyone with a disease or just everyone, period. A common criticism of DM outcomes is that too few studies pay attention to people in low- to moderate-risk groups.

Juster understands why. “It’s cost-effective to treat people who can benefit, cost-wise, within one to two years. That’s the upper 20 percent, plus or minus, of any group with disease.”

As for the other 80 percent, he says, it’s not cost-effective to reach them — yet. “If we can focus on that 20 percent, we’d be doing well today. And in five years, when we have the technology, we’d better be seriously chipping away at the other 80 percent.”

It would be tempting to end this story on that note, but Bonnette offers some food for thought that’s hard to ignore. “Is DM just a way of providing infrastructure for care management that should have been present all along — but wasn’t?” he asks.

“If you look at it that way, then I’m not sure we have to prove outcomes.”