Do diabetes wellness programs work?

Determining return on investment and improvement of outcomes is essential when new programs are being tried, but the research method must be chosen carefully


As wellness, case, and disease management programs proliferate, health plans must measure their outcomes and potential for return on investment. Here are a few strategies to help with that evaluation.

Recent years have witnessed a proliferation of wellness, case, and disease management programs in managed care. Despite their prevalence, few are subjected to the rigorous research designs necessary to determine whether financial goals are achieved. As a result, the effectiveness of these programs in reducing costs remains to be determined (Linden 2006; Mattke 2007).

The lack of stringent evaluation of these programs is often attributed to difficulty in identifying a suitable control group (Linden 2006; Linden 2003). Program evaluation in managed care has therefore tended to rely on observational pre-post studies that are subject to selection bias and regression to the mean (Linden 2006).

Program evaluators often resort to using a pre-post design with a relatively small percentage of enrolled members, comparing utilization for a specified pre-enrollment period with a similar post-enrollment period, or they compare the results of members enrolled in the program with eligible members who were unable to be enrolled. Each of these designs has critical weaknesses.

The pre-post design is highly subject to regression toward the mean. Members are identified for intervention as a function of being a utilization outlier. Regression toward the mean suggests that outliers at one point in time are likely to be closer to the mean at re-measurement, even in the absence of intervention. As a result, reductions in cost and utilization cannot be fully attributed to the intervention.

Comparing enrolled members to unenrolled members is subject to selection bias. Members who cannot be enrolled in phone- or mail-based interventions may be fundamentally different from enrolled members. Unenrolled members are unreachable or refuse to accept services. Being unreachable may suggest greater instability in housing or telephone service, or greater financial and psychosocial stressors. Or refusal to accept a free intervention may reflect a low level of motivation to improve health. If enrolled members have fewer financial and psychosocial issues, or are inherently more motivated to improve their health, these differences might explain group outcomes, rather than the intervention.

Given the importance of managing health care costs, it is critical that administrators appropriately evaluate outcomes of managed care programs aimed at reducing costs.

Measurement strategies are available that avoid regression toward the mean and selection bias, yet they are frequently overlooked. The purpose of this paper is to provide a framework for program evaluators to select the strongest, most feasible design for program evaluation and to provide general strategies to improve program evaluation.

Outcome Designs

Several designs can be implemented to objectively measure outcomes. The strongest is the randomized controlled design. When randomized control is not possible, evaluators may choose to employ a historical, intent-to-treat design or, when that is not possible, a matched comparison design. Each design has limitations; selecting the appropriate one depends on the context of the evaluation. The figure below illustrates a flow chart for selecting a design. The table following provides an example of each design.

TABLE: Examples of different study designs
Design Reference Description/design
Randomized controlled design Rosenzweig et al. 2010 Members with diabetes mellitus and coronary artery disease were randomly assigned to a diabetes disease management program. Although 356 of the 462 randomized to the program actually accepted services, all 462 were included in the analyses.
Historical intent-to-treat design Kolbasovsky 2009 A cohort of 305 members at risk for psychiatric rehospitalization was identified for intensive case management (ICM). Of these, 229 were enrolled. All 305 were included in all analyses. The 30-day inpatient psychiatric admissions and associated costs were compared to a historical cohort of 347 members identified, using the same ICM eligibility criteria, for a one-year period before initiating the ICM program.
Matched comparison design Dubois et al. 2010 A cohort of 72 patients with acute appendicitis who underwent appendectomy was identified for early discharge under a new protocol. A comparison cohort was identified with each control group member matched for age (± 3 years), presence or absence of a comorbidity, use of a laparoscopic procedure, and status of nonperforated appendicitis. Outcome measures included reduction in need for in-hospital beds. A balancing measure — ER use — was included.

Randomized controlled design

Description. A randomized controlled trial evaluates whether a cause-effect relationship exists between an intervention and an outcome (Sibbald 1998). These trials involve random allocation of patients or subjects to groups that are treated identically in all ways except the intervention (Sibbald 1998).

Limitations. This design is difficult to use in the managed care setting, as it is typically not feasible to withhold an intervention from one group of members.

When to use. Randomized controlled trials should be used in accordance with each organization’s policies and internal review board recommendations.

Although often not feasible, it may be possible that an intervention can be rolled out in stages, as opposed to making it available to an entire population at once.

If the outcome period is sufficiently small, an organization may be able to randomly identify a sample in which to administer the intervention while comparing outcomes with a random, wait-list control sample.

This requires a short outcome period and an intervention that does not require all plan members to receive services simultaneously.

Strategies. Within the managed care environment it is often difficult to utilize the randomized controlled design because withholding a potentially useful intervention from a member is unappealing.

Utilizing a wait-list control group and clearly presenting to a member assigned to this condition that he or she will receive the intervention within a reasonable amount of time and answering any questions a member has may enhance the feasibility of this design.

Shortening the measurement outcome period may be necessary to reduce wait time for members.

Historical, intent-to-treat design

Description. A historical, intent-to-treat design involves the creation of a retrospectively identified cohort that would have been identified for an intervention had the program existed during a period of time before program implementation. Once these retrospective members are identified, all clinical, utilization, and financial data are obtained for the given outcome period. This group becomes a baseline group to be compared to the intervention group — members eligible for the intervention as of the program’s inception. Using the same identification criteria for the intervention and baseline group prevents regression toward the mean from disproportionately affecting one group, as this phenomenon will affect both groups similarly.

The “intent-to-treat” component involves the inclusion of each eligible member in the intervention group, regardless of whether the member was enrolled in the program. Because it is not possible to know which members in the baseline comparison group would have been enrolled had the program existed at the time, all eligible members during the actual intervention year must be included to avoid a selection bias.

Limitations. Changes associated with time, such as policy changes, new legislation, and inflation may have a greater influence on one of the groups than on another or others, as each exists at a different point in time. Another limitation is that to achieve statistically significant differences between the groups, a considerable proportion of eligible members must receive the intervention, since all members intended to receive the intervention are included in the analysis. Unfortunately, many programs enroll only a small percentage of eligible members.

When to use. The historical, intent-to-treat design may be used when a randomized trial is not feasible, a historical cohort can be identified, the impact of time is minimal or can be controlled for, and enrollment is high.

Strategies. For this strategy to be effective, a number of elements must be observed or considered.

Use the same criteria. Allow no difference between the criteria used to identify the historical baseline comparison group and the criteria used to determine the intervention group.

Confounding variables. Identify and control for confounding variables associated with time. Identify and, to the extent possible, control for factors that disproportionately affect one group. But at times there may be good reason to ignore this rule. One can adjust for inflation, for example. Since inflation tends to increase costs in the intervention group, which exists later in time, an evaluator may decide not to adjust for inflation. This would make it more difficult to achieve statistical significance and would yield a more conservative estimate of program benefits.

Enhance program enrollment. Determine the percentage of eligible members required to receive the intervention before the entire intent-to-treat group will have a chance to demonstrate significant improvement. Once this is estimated, program administrators should do everything possible to enhance program enrollment beyond that level. Fortunately, strategies are available to increase program enrollment, such as using search engines to find correct contact information, interactive voice recordings, contacting members in the evenings and while admitted to the hospital, collaborating with practitioners, involving family members, using opt-out approaches, and motivational interviewing.

Matched comparison design

Description. The matched comparison design involves identifying a control group of members matched for one or more variables thought to influence outcomes.

Limitations. An important limitation to this design is its susceptibility to a motivation selection bias. Members receiving the intervention were motivated enough to accept the intervention and had a stable enough living situation to be reachable at that time. This may not be true for the matched group. Unless a proxy for likelihood of participating in a program is available among members in the matched group, controlling for motivation to participate is difficult.

When to use. This design is best used when the previous designs are not possible; when members can be matched on variables that are likely to affect outcomes; when stability of living situation is similar between the groups; and when motivation is not important to achieving desired outcomes or a proxy for motivation is available.

Strategies. Several strategies can be employed to enhance the effectiveness of this design.

Match on risk as well as demographic variables. Demographic variables such as age, gender, and line of business are often used in matching. However, if a program outcome variable is related to utilization or costs, it is important to match on a variable that represents risk of future utilization, such as predictive model risk scores, comorbidity indexes, or measures of previous utilization.

Use multiple matches per member. Select matching variables carefully. After two or three variables, it often becomes difficult to find matches.

Measuring Outcomes

All of these designs require certain strategies for measuring outcomes. Strategies can vary between designs, but should include the following elements.

Design a strategy. Design a strategy for measuring outcomes before program implementation.

The design selected should be decided after considering both the strengths and limitations. The design selected must be feasible yet able to provide valid and reliable results. Failing to consider these issues and designing a measurement strategy before program implementation often leads to poor design and problems that arise too late to correct.

Include balancing measures. Balancing measures assess how interventions aimed at improving one part of a system may be causing problems in another part (Institute for Healthcare Improvement 2010). For example, an intervention aimed at reducing the number of prescribers of controlled substances may increase ER visits and costs if members who are unable to get desired prescriptions from multiple outpatient physicians turn to the ER for service. Evaluators focusing exclusively on intended outcomes may miss the true impact of the intervention.

Include program costs. Many evaluators report cost savings associated with program interventions while failing to consider program costs. These may include staff salaries/benefits, operating expenses, and increased costs associated with balancing measures.


Managed care administrators can evaluate programs more effectively by using the most scientifically rigorous research design feasible. By identifying and implementing an outcome measurement strategy before program implementation, administrators can evaluate program outcomes appropriately and use these results to allocate resources effectively and to make informed decisions about reducing health care costs and improving quality.


Dubois L, Vogt K, Davies W, Schlachta C. Impact of an outpatient appendectomy protocol on clinical outcomes and cost: A case-control study. J Am Coll Surg. 2010 Dec;211(6)731-737.

Institute for Healthcare Improvement Measures 2010. Accessed on October 7, 2011, at

Kolbasovsky A. Reducing 30-day inpatient psychiatric recidivism and associated costs through intensive case management. Prof Case Manag. 2009;14(2):96-105.

Linden A, Adams J, Roberts N. Evaluating disease management programme effectiveness: an introduction to the regression discontinuity design. J Eval Clin Prac. 2006;12(2):124-131.

Linden A, Adams J, Roberts N. An assessment of the total population approach for evaluating disease management program effectiveness. Dis Manag. 2003;6(2):93-102.

Mattke S, Seid M, Ma S. Evidence for the effect of disease management: Is $1 billion a year a good investment? Am J Manag Care. 2007;13(12):670-676.

Rosenzweig M, Taital M, Norman G, Moore T, Turenne W, Tang P. Diabetes disease management in Medicare Advantage reduces hospitalizations and costs. Am J Manag Care. 2010;16(7):157-162.

Sibbald B. Understanding controlled trials: Why are randomized controlled trials important? BMJ. 1998;316(7126):201.

The author has written several articles demonstrating clinical and financial results of case management programs aimed at reducing inpatient and ER recidivism and associated costs. He has published two books and has spoken at many conferences on predicting and reducing utilization cost and on improving outcomes.

Disclosure/Conflicts: The views expressed in this article are the author’s alone and not of EmblemHealth. Further, he reports no conflicts of interest in the development of this article.