The notion of quality of care in medicine is not new, but it is becoming increasingly important as the competitive health care market demands objective measures to compare physicians, hospitals, and managed care organizations. Informatics researchers and health care vendors have responded with computer systems that promise to improve health care delivery. How well do these systems fulfill the promise? The answer requires an understanding of systems and the standards of quality used to assess them.
This review paper will focus on the computerized methods designed to improve some components of quality of health care. The first section will provide a broad definition for quality of care and discuss the ways it is measured.
The second portion of this article is a hypothetical case discussion of the implementation of strategies of quality improvement. We provide expectations of the effectiveness of these strategies with different degrees of computerized support.
Finally, we review the literature in computer-based decision-support systems, and draw conclusions about their effects on aspects of quality.
Donabedian is often credited with formulating the conceptual model of quality of care. He recognized that the term is used in so many contexts that its meaning is often misunderstood. A patient's view of quality of care may differ from that of a provider or a payer.1 To account for all these perspectives, the definition of quality of care should include a combination of one or more of the following measurements, which are drawn from recent reviews:2,3
Structural measures: credentialing processes, availability of resources, staff-to-patient ratios;
Satisfaction measures: a patient's perception of the relative benefits of treatment on quality and quantity of life, balanced by the difficulty of undergoing the necessary treatment;
Process measures: assessment of the degree of adherence to standards of practice; and,
Outcome measures: evaluation of clinical end points (functional status, mortality, hospitalization) that result from treatment.
Real-world clinical scenarios highlight the danger of limiting a judgment of quality to a single perspective. For example, a good outcome is important, though its value is much less compelling if it occurs despite an unnecessary procedure. Conversely, a poor outcome does not necessarily equate to poor quality if the treatment was properly rendered by a credentialed staff in the correct manner.
Structural measures reflect basic features of the provider makeup and resources of a health care facility. Examples include the total number of physicians and the percent who are board certified, the nurse/bed ratio, admissions and emergency department visits per year, and number of procedures performed. These measures are objective and easy to report, but are indirectly related to other measures of quality. Though some studies have shown survival advantages at facilities that provide a greater volume of some procedures,4 these studies are difficult to generalize to all procedures. Additionally, the applicability of structural measures alone is not straightforward; consider, for instance, the "best" care for a patient living near a hospital where only routine care is available. In acute illness, the transit time to reach facilities with more advanced treatment modalities must be weighed against the danger of delay of basic therapies available at the closer, albeit less technologically advanced, hospital.
Satisfaction measures reflect the patient's view of quality. Patient satisfaction with care may be difficult to integrate with other measures, because it is strongly subjective. The same health care delivery strategy may be viewed differently by different patients, even if other measures of quality are more congruent. For example, patients desiring a minimalist approach to care may consider health maintenance programs too paternalistic, and therefore be unsatisfied with them. Conversely, a patient who desires extensive testing may find the same health maintenance strategies too cursory. Additionally, satisfaction incorporates such issues such as parking, access to public transportation, waiting room décor, waiting times, and staff interaction. Although many issues may complicate the relationship between patient satisfaction and other measures of quality, the most recent annual report of the National Committee for Quality Assurance, provides evidence that the two are related. Health plans that place in the top quartile of NCQA's Health Plan Employer Data and Information Set (HEDIS) consistently score higher on the Consumer Assessment of Health Plans Study (CAHPS 2.0H) satisfaction survey than plans in the lowest HEDIS quartile.5
Process measures assess the degree of adherence to accepted standards of practice. While these standards frequently have a basis in evidence, in reality, many standards are set by expert consensus. These standards assess processes designed to keep a healthy patient disease-free, or to manage a patient with disease safely and efficiently. While this practice now seems common, the importance of health maintenance strategies was only formalized into measures of quality by NCQA in the 1990s.6 Examples include such preventive measures as annual flu shots and pneumovax immunizations for susceptible individuals, and cancer screenings.
Disease-process measures are analogous to health-maintenance measures, except that they focus on routine interventions for individuals with specific diseases. These measures determine compliance with proven or accepted strategies for improving outcomes. Examples include measurement of HbA1c among diabetics, and use of steroid inhalers for moderate to severe asthmatics.
Outcome measures are increasingly used for quality assessments. Unlike process measures, outcome measures focus only on health status resulting from a physician-defined treatment plan. Proponents of process measures believe their assessments are more sensitive quality indicators than outcome measures, because not every deviation from the standard management plan will result in a poor outcome. However, process measures will correctly reflect poor quality of care when a good outcome occurs despite improper treatment.
Proponents of outcome measures believe that although evidence exists to support the efficacy of individual components of a disease management process, the integration of components must be tested as a unit for a broad range of outcomes. For example, although strict adherence to diabetes protocols is associated with less end-organ damage, it is also associated with increased episodes of hypoglycemia,7 which can result in auto accidents, other injuries, and hospitalizations. Without an assessment of all these outcomes, strict adherence to guidelines and high scores on process measures may not have the expected effect on a population's health.
Just as process measures of quality must be judged with caution, outcome measures also warrant careful interpretation. Elements that contribute to outcomes include effectiveness of treatment, patient-risk factors, and random chance. The relationships among these are highly complex and difficult to predict. For example, early outcome measures focused on such concrete events as surgical mortality, though results were not widely accepted given unreliable methods for severity adjustments.8
Quality of care is not a uniform concept. Therefore, when choosing among systems that report quality-of-care improvements, one should be certain of the component of quality being addressed, and consider its potential effect on other components.
There are many techniques that can enhance quality of care. One of the oldest and well-known is continuing medical education. Seminars and lectures are among the simplest ways to attempt to influence clinician behavior, and thus improve quality. There is some evidence, however, that this technique is not as effective as others. Why CME has not lived up to its potential for improving quality of care is a topic for debate,9 but one possibility is that it is not applied at the point of care, i.e., when the physician is with a patient and requires immediate advice. Similarly, physician report cards summarize a physician's performance on processes, outcomes, and patient satisfaction. While this feedback is valuable and may compel a physician to do better next time, it lacks the immediacy needed to influence clinician behavior.10,11 Other techniques, such as disease management programs, guideline dissemination, and decision support, are applicable at the point of care, having a greater effect on quality.
These point-of-care methods of quality improvement can be implemented with or without the aid of computer technology. The following case discussion describes implementation strategies for quality improvement at two fictional health systems, and illustrates the advantages and disadvantages of each method.
Two integrated health care delivery systems, E-Health and Snail Health, reviewed medical evidence about care for patients with hypertension. The impetus for their review is the publication of an updated version of the guideline published by the Joint National Committee for the Evaluation and Treatment of Hypertension. This details the recommendation that diuretic therapy is first-line therapy for most patients. Patients with heart disease should be on beta-blocker therapy, and patients with diabetes should be on ACE inhibitors. A literature review indicates that physicians in general are not complying with the guideline, and a small pilot study at the health system of actual treatment of hypertensive patients reveals that its physicians perform below national averages. After reviewing this guideline, both systems decide to design some quality initiatives to specifically address these recommendations.
Snail Health is a large integrated delivery system. It consists of several hospitals and a large primary care network. The system has slightly outdated electronic systems that handle patient scheduling, registration, and billing. Snail Health has a quality-improvement department that consists of several physicians, nurses, and administrators.
Based on evidence of poor quality of care for hypertensive patients, as assessed with a process measure, Snail decides to adopt a team-based disease management approach to improve compliance with the standard of care for hypertension. This consists of reviewing several providers' charts, then producing a report card detailing how well each is complying with the recommendation. These are sent to each provider via interoffice mail. The QI department also sets up a series of educational seminars to try to improve physician awareness of the guidelines. Paper guidelines are sent to all physicians, and also appear on the system's web site.
E-Health is also a large system consisting of several hospitals and a primary care network. Physicians in this system use an electronic medical record that stores a vast array of clinical data that is copied to a large central database, which can be searched and analyzed. The quality improvement team for this health system has fewer employees, and consists of a quality officer and a small cadre of educators and clinicians. The quality-improvement strategy involves report cards and computerized decision support. The quality officer requests a query of the database to identify all patients with hypertension. The search also identifies the type of therapy each patient receives. Electronic report cards are generated for each physician. Specific feedback is given to the doctors. Computerized decision support is implemented by embedding the hypertension guideline within the electronic medical record, thereby enabling the computer to make patient management suggestions that account for concurrent medical conditions and previous therapy failures. The physician can choose to accept or ignore the suggestion, or request additional information on the logic and evidence supporting the suggestion.
Both systems were able to institute population-based quality improvement programs for hypertensive patients. Improvement would be assessed using a process measure of quality that examines degree of compliance with the standard of care. Both systems addressed quality improvement through implementation of a care guideline, plus other initiatives, including CME and report cards.
Which health system performs better? Both Snail and E-Health have computerized mechanisms for distributing guidelines. Yet, E-Health's guideline, integrated with the electronic medical record, has important advantages in providing suggestions for care at the point where they are likely to have the greatest impact. The availability of a guideline online is no guarantee that it will be viewed at the critical time. In addition, the marginal cost of guideline integration with the electronic medical record was lower than the team approach, because the quality initiative only involved adaptation of existing programs, rather than development of new educational initiatives.
While E-Health may appear to perform better than Snail Health, a more critical evaluation may reveal weaknesses in the E-Health strategy. The fact that the integrated hypertension guideline was successful in isolation does not mean its suggestions will be as readily followed when combined with suggestions regarding management of several concurrent illnesses. Physicians may begin to ignore reminder systems when presented with too much information. As described below, many factors contribute to the success or failure of a computerized decision-support system.
We define decision support as a set of information-presentation techniques that influence clinical care. Computer systems may enable the implementation of decision support, but they are not a prerequisite.
Computerized decision-support systems have been described as systems that can "synthesize and integrate patient-specific information, perform complex evaluations, and present the results to clinicians in a timely fashion."12 Computerized decision-support systems have been used across a broad spectrum of settings and clinical problems. They have been used to aid diagnosis and appropriate use and management of treatments. These general categories, diagnostics and therapeutics, provide a logical division for the discussion of clinical decision support. The information needs of physicians when making diagnoses differ from when they prescribe therapies. To make a diagnosis, the physician must integrate a large set of subjective clinical exam findings and patient complaints and arrive at a logical conclusion. The therapeutic process is more objective. The physician must, for instance, know the appropriate drug dose based on the patient's weight and all of the possible drug interactions. While this process may involve some degree of judgment, applying the appropriate therapy is often a matter of memorization. It is in these latter cases that computerized decision-support techniques have shown their greatest promise. We will discuss each of these categories of computerized decision support separately.
Table 1 outlines some recent studies that have collected primary data on the effect of computer-based decision support on quality of care. There are many more; a recent review of the literature11 in this area identified 68 controlled trials. Many of these studies have demonstrated improvements in drug dosing and compliance with clinical guidelines.
|TABLE 1 Therapeutic decision support|
|Author/Year||Population studied||Technique used||Goals||Quality measures||Outcomes|
|Pestotnik/199634||Hopitalized patients||On-screen reminders in physician order entry system||Improve antibiotic prescribing patterns||Overall cost |
Unnecessary antibiotic use
Adverse drug events
|Improvement in all process and outcome measures|
|Walton/199735||Simulated cases||On-screen advice||Improved antibiotic prescribing patterns||Selection of less-expensive alternatives||Improvement in this process measure|
|Mungall/199836||Hospitalized patients||On-screen advice||Improved dosing of heparin||Measurement of blood tests for anticoagulation||Improvement in this process measure|
|Evans/199837||Hospitalized patients||On-screen advice||Improved antibiotic prescribing patterns||Adverse events due to drug allergies |
Excess drug dosing
Appropriate drug dosing
|Improvement in all process measures|
|Bates/199838||Hospitalized patients||On-screen advice and pharmacy follow up||Improved prescription of all drugs||Adverse drug events||Improvement in this process measure|
|Poller/199839||Outpatients||Computer-generated dose recommendation||Improved dosing of warfarin||Measurement of blood tests for anticoagulation||Improvement in this process measure|
|Lobach/199740||Outpatients||Computer-generated reminders for preventive care||Improved preventive care for diabetics||Compliance with care guidelines||Improvement in these process measures|
Where computerized decision-support techniques have shown the most promise is in drug dosing. A recent Institute of Medicine report13 suggested that medical errors are the seventh leading cause of death in the United States, contributing to an estimated 44,000 to 98,000 deaths a year and estimated costs in excess of $348 million. An estimated $159 million is directly attributable to preventable medication errors. It was estimated that between 1.8 and 4.9 errors were committed per 1,000 prescriptions, with the higher numbers occurring in children where dosing is complex. This is not surprising, given the amount and complexity of medicines that are prescribed in hospitals.
One drug, heparin, figures prominently in hospital errors and complications. This anticoagulant drug is given in large quantities, and must be administered via infusion pump. It is used for treatment of blood clots in the legs, as well as more serious clots that have traveled to the lung. Both initial and subsequent doses of this drug must be carefully calculated, leaving room for human error. If such errors are made, they can result in heavy bleeding or complications related to excess clotting. Misuse of this medicine has resulted in serious morbidity and mortality for patients. Computerized decision-support techniques have shown improvement in the accuracy of its dosing and administration. The most recent study in this area by Mungall et al.36 showed that patients treated by physicians using a computerized nomogram for heparin dosing were greater than 50 percent more likely than patients treated with the nomogram alone to have therapeutic blood values.
Improvement in administration of other complex drugs, such as warfarin,14 theophylline,15 and lidocaine,16 has been shown as well, but results are not consistent.17, 18 While these results are promising, it is worth noting that drug administration is only part of the complex decision-making process that affect quality of care. For instance, except where treatment protocols exists, we were unable to identify any studies that demonstrate improvement in the area of appropriate drug selection. Therefore, computerized decision support has not been shown to assist in the critical decision of therapeutic strategy. Once the strategy has been chosen, however, a decision-support system can ensure it is pursued according to accepted standards of care.
Numerous studies have evaluated physician compliance with practice guidelines as a measure of quality. These studies have been conducted in inpatient and outpatient settings. Preventive measures and treatment guidelines have been analyzed, with promising results. Recently, Lobach et al.40 studied use of a diabetes-care guideline generated from an electronic medical record. They found an increase in compliance from 15.6 percent to 32 percent when the computer, rather than a paper-based generic guideline, was employed.
Many studies have demonstrated the benefit of computerized decision support for some health maintenance activities such as mammography19 and vaccinations,20 but not necessarily all. Cervical cancer screening with a Pap smear is an important exception.21 Many factors may contribute to the absence of improvement in Pap smear rates, even when computerized reminder systems are used. One reason may have to do with a clinical office's work flow and resources. The Pap smear, unlike mammography, is an in-office procedure, and unlike a vaccination, may require extra physician time, space, and special equipment. This hypothesis is supported by the view that a computerized decision-support system cannot overcome, and in fact may highlight, inadequate office work flow.
Despite studies showing improved quality as assessed by process measures, few studies have shown benefits of therapeutic decision support, in terms of outcome measures. In fact, two studies that specifically examined hypertension-guideline compliance22, 23 (as mentioned in our case study) showed no benefit on outcome measures. Conversely, studies of guideline compliance (i.e., good process) for urinary incontinence24 and anxiety and stress25 have shown improvement in outcome measures. There are probably many reasons for the discrepancy between the number of studies that show improved adherence to guidelines and the lesser number that demonstrate outcomes. First, because poor outcomes are relatively rare regardless of whether decision support is used, a study needs to enroll larger numbers of patients to show a statistically significant beneficial effect of decision support. Second, poor outcomes may not present for years after the intervention, so studies must be of a significant duration. Definitive studies with large number of patients that run for many years may be prohibitively expensive to conduct.
The ability of computerized decision aids to improve quality of care depends on systems' ability to know with some certainty what medical condition is present that requires treatment. For example, if the diagnosis is known to be hypertension, the system can recommend therapy consistent with standards of care and that will produce desired outcomes with a minimum of adverse events. However, if the diagnosis is less certain, the appropriate therapeutic course is similarly unclear. Given a patient presentation of nonspecific abdominal pain, what is the most appropriate therapy? If the presumed diagnosis is wrong, then the recommended therapeutic plan could be detrimental to patient outcomes. If treatment were initiated for peptic ulcer disease, resulting in symptomatic improvements, and the true diagnosis was cancer, then high quality care is not being provided. Therefore, computerized diagnostic decision support systems have been developed to help physicians reach conclusions with more certainty. While the mathematical sophistication of these tools has grown, and success in focused domains has been achieved, notable improvements in quality of care have been more elusive.
The origin of computer-aided diagnostic systems often is credited to Ledley and Lusted.26 Their 1959 paper described symbolic logic and probability theory that led to diagnoses similar to those produced by physicians' complex reasoning. Since then, there have been many refinements in the scope, methods, and capabilities of computer-aided diagnostic systems. Although diagnostic decision aids can be categorized by the mathematical theory that underlies their methods, a simpler scheme distinguishes them by their use in clinical care:
Differential diagnosis generation is the process by which a clinician typically approaches a diagnostic dilemma. The differential diagnosis is a list of all diagnostic possibilities that a physician considers, even remotely, to explain a given set of clinical findings and test results. The physician then prioritizes elements of the list based upon clinical likelihood or risk to the patients. Diagnoses on the list are then systematically eliminated or considered further depending on their congruence with additional test results and the overall clinical scenario. Factors that result in poor quality of care include excessive testing to rule out diagnoses that are extremely unlikely or clinically unimportant. Errors occur as a result of inadequate attention to certain diagnoses or failure to include significant diagnoses on the list. Therefore, the provision of quality care depends on a clinician's ability to develop a thorough list that is appropriately ranked. How have computing tools aided this component of care?
A 1994 study27 compared the performance of four differential diagnosis programs. In this study, 105 cases were chosen from actual clinical experiences, based on criteria that they were diagnostically challenging, with clinical findings and actual diagnoses known with certainty. The clinical findings were put into the diagnostic programs, and the resulting differential diagnoses were analyzed on a number of objective and subjective scales. Objective measures of success included the presence of the actual diagnosis anywhere on the list, the presence of the diagnosis within the top 10 on the differential list, and the mere presence of the diagnosis within the program's knowledge base. Subjective scales included the relevance of the top 20 diagnoses, defined by their appearance on an expert clinician's differential diagnosis list, and additional diagnosis that were considered relevant, but not included on the clinician's original list. The correct diagnosis score, representing the presence of the correct diagnosis anywhere on the list, ranged from 52 percent to 71 percent. Only 37 to 44 percent of correct diagnoses appeared on the list of the top 10 most likely diagnoses. One interesting finding was that the number of additional diagnoses found that seemed relevant, but were not on the clinician's original differential, was between 1.8 and 2.3. The potential benefit for the differential diagnosis-generating program to inform the physician about additional relevant diagnoses must be weighed against the "noise" that arises from the presentation of irrelevant or inappropriately ranked diagnostic choices. Considering the advantages and disadvantages of the differential diagnoses generating programs, one reviewer gave them a grade of "C."28
While the computerized generation of a differential diagnosis list has been problematic, diagnostic programs have demonstrated accuracy when asked to rule in or out a small set of possible diagnoses when presented with objective clinical findings and test results. These tools provide a physician with a computerized "second opinion." This support is important when the range of diagnostic possibilities is small and easily identified, but some diagnoses are of marked urgency over others. For example, physicians must be able to distinguish accurately between appendicitis and more common causes of abdominal pain. One early success in this area was a program that addressed this diagnostic challenge. This program was tested on a series of cases whose outcomes were already known. When presented with signs and symptoms of 304 patients, the program correctly diagnosed 279, including 84 of 85 patients with appendicitis. Surgeons who were presented with the same cases could only make 242 correct diagnoses.29
Another example of the ability of programs to accurately rule in or rule out diagnoses involves patients who present to emergency departments with chest pain. While significant evidence exists supporting the use of angioplasty or thrombolytic agents in patients presenting with acute myocardial infarction, quality of care demands that these potentially dangerous and very expensive interventions be attempted only in patients for whom the diagnosis is certain. Although in some patients, EKG findings are strongly suggestive of an MI, in many other cases the presenting signs and symptoms are more subtle, and the diagnosis less certain. This uncertainty can result in delay of therapy in patients with the diagnosis, or unnecessary admission for patients later determined not to have an MI. Computerized diagnostic aids based upon neural network technology have been especially effective at distinguishing ischemic causes of chest pain from more benign etiologies. In one study of 356 high-risk patients who presented with chest pain and were admitted to a cardiac care unit, 120 were later found to have MI. The neural network program, when provided only with clinical signs and symptoms known at the initial patient presentation, was able to correctly make the diagnosis of MI in 92 percent of patients who actually had it. The program also appropriately ruled out the diagnosis in 96 percent of patients who did not have an MI.30 In a separate study, physicians' ability to correctly diagnose MI had a specificity of 85 percent and a sensitivity of 78 percent.31 Despite the apparent accuracy of the neural network systems for diagnosis of MI, no study has yet attempted to demonstrate the effect of the use of the systems on physicians' decision making, and the resulting effect on quality.
Advances in artificial intelligence have enabled development of sophisticated systems that help physicians interpret radiographic and pathology specimens. These systems are similar to those previously described, in that they are used to rule in or rule out diagnoses. However, an important distinction is that instead of using subjective patient complaints or physical exam findings as input, these systems process and quantify features of an image and then use a variety of analyses to determine the likelihood that specific features of the image represent normal or abnormal findings. These systems offer the potential to improve the accuracy and reduce the variability observed in common radiographic procedures, such as mammography and chest X-rays. Studies have shown that radiologists may miss 15–30 percent of certain masses on mammography, and up to 30 percent of lung nodules.32, 33 Use of computer-aided evaluation of mammograms has cut the number of missed lesions in half without a substantial increase in false positives.
Use of computer-aided analyses of chest X-rays looking for nodules was less impressive. Examining 95 challenging chest X-rays, the computer system detected 15 abnormalities missed by radiologists. However, the computer also missed 11 abnormalities that were detected by the radiologists. Additionally, 24 cases were missed by both the radiologist and computer systems.
In addition to the detection of radiographic lesions, the diagnostic aid should be helpful in determining the significance of a finding. This can affect quality of care by reducing the amount of unnecessary biopsies that would otherwise be required to demonstrate the nature of the lesion. Among lesions that could be detected by computer, the reported likelihood of malignancy was more accurate for the computer than for the human counterparts. However, the extent to which the improved accuracy provides enough confidence to avert a confirmatory biopsy is still unknown.
Ensuring quality of care requires a complex mixture of art and science. Computerized decision-support systems can enhance quality when they support the scientific side of medicine. Clinical computing tools are most successful when their role is to ensure that basic care is not overlooked while the physician focuses on more acute issues. Where the diagnosis is certain, decision support tools can also assist the physician to choose an appropriate therapeutic course, and to ensure that standards of care for the particular disease are being met. While therapeutic decision support can also aid more complex management decisions, assessment of the resulting quality of care may be problematic. The complex patient may have concurrent conditions that warrant deviations from the standard treatment protocol, rendering computerized decision-support systems less effective.
Clinical diagnosis, like complex therapeutic management, is often more art than science. Effective diagnostic decision support depends on knowledge of clinical findings and test results that, ideally, can be known with certainty. In reality, this is not the case. Clinical signs and symptoms are inherently subjective. Is the patient's pain severe, or is the patient exaggerating symptoms? In the setting of generalized pain, is there a particular area of tenderness? Is there a change in the heart sounds from an earlier exam? Are any of the presenting signs and symptoms unrelated to the actual diagnosis? Even with laboratory tests, which are usually considered objective, a physician must consider if a result is normal for the patient, even if it falls outside the usual range. Therefore, for many medical conditions, the diagnostic aid can only be effective if the physician interprets the findings in the context of an individual patient's characteristics and the doctor's experience with similar patients. This requires a level of judgment that is difficult to impart to a decision-support system.
A growing body of evidence has shown advantages of computerized decision support in medical-error reduction, particularly with drug dosing issues and in improved compliance with health maintenance and disease management strategies. An important caveat to this conclusion is that many of the studies of decision-support systems have been performed at locations where clinical computing is an inherent part of the medical culture. Similar acceptance and quality improvements may not be seen at institutions that have not yet adopted this culture. Additionally, settings where decision support has not been shown to improve quality highlight the significance of work flow and environmental factors on the practice of medicine. Therefore, a successful computerized decision-support system does not work in isolation, and cannot be superimposed on a clinical practice. An implementation that improves quality of care must have input from the providers at the planning and evaluation stages and be a part of an overall strategy to reengineer clinical practice.
Mark Weiner, M.D.
Assistant Professor of Medicine
University of Pennsylvania School of Medicine
Guardian Drive, Room 1116
Philadelphia, PA 19104