Too many HMO executives are swayed by vendor data compiled using flawed methodology. Keep your eye on the denominator.
When Rick Millard, Ph.D., made a presentation at the Data Integration and Analysis for Disease Management conference in March, he probably didn't dream that the reaction would be so intense. Millard, vice president of clinical affairs for Patient Infosystems in Rochester, N.Y., reported preliminary findings on his company's CareSense diabetes program. They showed reductions in emergency department visits and greater compliance with medication dosage requirements and blood glucose testing. Also — and this is a statistic HMOs might find attractive because it addresses employer concerns — participants missed fewer days from work and school.
The drawbacks, some detractors say, are that the study uses volunteers, and the number of participants — 113 — makes it statistically insignificant. During initial conversations and E-mail exchanges, Millard at first countered that the problem lay in how Disease Management News, a trade newsletter devoted to the topic, reported his presentation. (DMN, for its part, stood by its story and — eventually — Millard said that DMN's account was, in fact, accurate.)
All of which backs us into a discussion about a constant hazard for HMO executives who wish to purchase DM programs: flawed methodology of studies many vendors use to demonstrate effectiveness.
"It's a huge problem," says Bob Stone, executive vice president of the Diabetes Treatment Centers of America. "Most of the congestive heart failure companies report results only on the population with which they are actively involved. That's only about one quarter of who has the disease."
Al Lewis, president of the Disease Management Purchasing Consortium and one of the foremost experts on DM contracting, says studies need to account for motivation.
"People should read the study design very carefully," says Lewis. "The number-one most important thing in disease management is motivation to change. By getting voluntary enrollees, you're already getting people who want to change. To get a true measure of impact, you should include everyone, whether or not they've actively participated, or their physicians participated, or you contacted them, or even if you didn't know they existed."
Millard, for his part, originally said that the main reason Lewis and others have problems with the Patient Infosystems study was because the DMN article did not fully report what was in his presentation. Also, he said, there wasn't enough emphasis on the fact that these were preliminary data. (A re-reading of DMN's article didn't back this assertion. The lead refers to a "test study" and the second paragraph talks about "the first reported results.")
James Gutman, DMN's editor and publisher, welcomes the opportunity to put to rest Millard's complaints, which he says have also been made on the Disease Management Forum, an electronic mailing list sponsored by Managed Care magazine.
"We attended a presentation at a conference and we believe our article fully represented what was stated, given the limitations of space," says Gutman. "We also reported in the article that Patient Infosystems said it is gathering utilization data on persons who did not enroll in the program so that it can make comparisons."
When told of Gutman's comments, Millard says he went back and re-read the original article and discovered that he — Millard — had made a mistake.
"Gutman did just what he was supposed to do," says Millard, adding that what he calls misinformation about CareSense was possibly circulated as a result of exchanges on the forum.
All of which might be almost interesting for readers of the Columbia Journalism Review. It's flawed DM studies that bring us here and, on this issue, most experts find themselves on common ground.
For instance, Millard agrees with the generally held notion that selection bias is a common threat to the validity of research findings.
"When randomization is not achievable, as is frequently the case in behavioral science research, appropriate comparison samples may be drawn from the larger population," says Millard. "This was the method employed in this study. There are several other threats of note, including maturational effects, the effects of being observed, statistical regression to the mean, and sample attrition. To some extent, each of these can never be completely controlled."
Millard says that claims data are being used in the CareSense diabetes program to acquire information on service utilization among nonparticipants.
"However, even claims data has certain limitations," says Millard. "For example, it will not tell you whether a patient is being compliant. Even if self-reported compliance is subject to bias because a few patients will give an inaccurate, socially desirable answer, it constitutes a greater level of information than can be obtained by claims data alone. This is why it is important to obtain both administrative data and patient self-report."
Millard says the CareSense program involved Patient Infosystems employees making phone calls to patients covered through Alabama's Medicaid agency. The calls were made after one, two, three, four, six, and eight months. Patients had a choice of talking to representatives or to interactive voice response systems that asked things like when was the last time they had their hemoglobin levels measured. After each call, reports were mailed to the patients and physicians. Plans got quarterly reports without specific patient identifiers. Millard says that that 81 percent of adult patients reported in the first call that they had not had their hemoglobin levels measured. By the fourth call, that had fallen to 63 percent.
"I'm confident the final results will be similar to what we've presented so far," says Millard. In the meantime, he doesn't want his study to be a bone of contention with others in the DM universe. "Criticisms are useful, and after all, they have emanated from only one person," he says, alluding to Lewis.
But, as Lewis points out, he has not been the only one to criticize Patient Infosystems' methodology.
"There are only two ways of doing a valid prospective controlled DM study that are acceptable to the consortium, which procures about 70 percent of all DM contracts: passive control/passive study, or denying access to the program to people who volunteer," says Lewis. "The second, though valid, is impractical and from what I've read, the study did not fit the first category."
Study standards needed
James Bonnette, M.D., chief medical officer of ProMedex, a vendor with 14 DM programs, thinks it's time for those in the industry to agree on guidelines. Regarding the 113 who participated in the Patient Infosystems study, Bonnette says the number is statistically meaningless.
"We do more interactive voice responses in a day than that," he says. "Currently, we run 5,000 to 6,000 patient IVRs a month."
However, Bonnette doesn't feel that the stratification some vendors and health plans do is always necessary.
"My biggest concern is relying on administrative data sets alone to risk-stratify patients," says Bonnette. "Those data sets are notoriously unreliable. We feel that direct contact with the patient, in an economically feasible method, is the only way to accurately risk-stratify and engage the patient in appropriate behavioral and physical changes."
Daniel H. Freeman Jr., Ph.D., who directs the office of biostatistics at the University of Texas Medical Branch at Galveston, says there are "innumerable reasons" why studies are often badly flawed. He asserts that three things should be present in a valid study: a clear and testable hypothesis, randomization and a control group, and masking participants and observers.
"In any study, one is motivated to do the study, hence there is an underlying preference about the outcome," says Freeman. "This motivation also applies to subjects. The key is that subjects may not need to know they have been enrolled in a disease management program. Similarly, the results should be independently audited."
Freeman says that having a testable hypothesis is the most important element in a valid study.
"This means a research project is set up in such a manner that the conditions under which a hypothesis is declared false are clear," says Freeman. "Thus, the conditions under which a DM program is declared successful must be clearly stated. The next issue is randomization. If the samples are large, then equal numbers of motivated individuals will be in DM and non-DM groups. If there is only a DM group, there can be no randomization and we cannot attribute readings to the program. Put differently, the results may entirely be attributable to self-selection."
Michael Cox, president of Interactive Heart Management, which manages coronary artery disease, says his company has no choice but to look at the entire population — those who've had claims for heart problems, as well as all those at risk but who have yet to have any claims.
Mr. Smith's heart
Many heart failure studies, says Cox, do not show the effect of their interventions on the total burden of heart failure within the membership.
"Rather, their outcomes show that Mr. Smith, who has heart failure, went to the hospital three times in the baseline year and only once in the intervention year," he says. Program administrators might try to interpret that as meaning hospitalizations were reduced by 66 percent.
"True of Mr. Smith, but how many Mr. Smiths were there in the first place?" says Cox. "What did all Mr. Smiths cost membership PMPM at baseline? What percentages of baseline Mr. Smiths were enrolled in their program? Finally, what did the economic PMPM effect of all Mr. Smiths enrolled have on the membership compared to the PMPM baseline of all Mr. Smiths?"
Studies that don't include this analysis are flawed, Cox contends. Focusing solely on volunteers could work only if the vendor shows the burden of disease at baseline and the effect the volunteers or motivated heart failure patient enrollment show against that baseline following the intervention year.
Always keep in mind that numbers can be massaged.
"If a coronary artery disease management intervention only focuses on comparing last year's heart attacks, bypasses, and angioplasties to the intervention year, in general you would show a 60 percent reduction if you did nothing," says Cox. "Only 33 percent of last year's heart attack patients have a recurring heart attack this year, less than 10 percent of last year's bypass patients will have bypass this year, and 20 to 25 percent of all last year's angioplasty patients will have another angioplasty this year."
He adds that a good DM study:
- Creates a baseline of events, hospitalizations, and procedures described as rates per thousand in the membership under and over 65;
- Determines the economic baseline burden of disease on a PMPM basis for those under and over 65;
- Ascertains the effect of its program in reducing the rates per thousand from baseline, adjusted for age and membership differences; and, finally,
- Measures the economic effect PMPM of the intervention for those under and over 65, adjusted for age and membership differences.
The DTCA's Stone says it helps when DM inclusion is made easy.
"Everybody is in unless they affirmatively elect not to be in," he says. "The reason for that is if you use voluntary enrollees, you never get more than 25 or 30 percent of the population. This reflects our experience. Other programs, such as CHF, may be lower."
Stone's advice for HMO executives: "Find out what the results were, and the denominator on which they were measured, then apply them across the entire population."
Cox says it is important to analyze the practice parameters for all of those with the condition.
"It's well known that in coronary disease, what is driving the rate of invasive procedures is the local supply of diagnostic testing," says Cox. "The intensity of this practice pattern affects the demand to undertake bypass surgery and angioplasty, and is unrelated to the intensity of sickness and health outcomes of people receiving such services."
For example, a Medicare study appearing in the Dartmouth Atlas of Health Care in the United States in 1996 showed that the rates of coronary bypass surgery per thousand enrollees varied by a factor of more than 4, from 2.1 in Grand Junction, Colo., to 8.5 in Joliet, Ill.
"Determining the baseline practice pattern tells us and the client what the total burden of the disease, caused by this practice pattern, has on the entire membership and what potential impact our DM interventions will have in modifying this practice pattern, its cost, and the health patterns of the membership," says Cox.
If some of this sounds rudimentary, Lewis says that that isn't reflected in the marketplace, where he sees HMO executives too often swayed by vendors using flawed studies.
"Without mentioning any names, three fourths of all vendors who, I estimate, get about one fourth of all the contracts, use very flawed outcomes analysis methodologies," says Lewis. "HMOs with those contracts can't point to savings, they lose interest in DM, and the whole industry suffers."