Inter-Rater Reliability Testing For Utilization Management Staff

Sue McQuillian, FSA
Consulting Actuary, Milliman USA


Recent regulatory pressures and certification requirements have heightened the need for payer organizations to abide by specific standards regarding medical management operations. Payer organizations that are making medical-necessity determinations regarding reimbursement for health care services, as well as risk-bearing provider groups to whom some of these functions have been delegated, must insure consistency and appropriateness in their determinations. The most powerful accreditation body for health care payers, the National Committee for Quality Assurance (NCQA), requires that payer organizations carry out periodic inter-rater reliability assessments to insure consistency in medical management decision making.

An inter-rater reliability assessment or study is a performance-measurement tool involving a comparison of responses for a control group (i.e., the “raters”) with a standard. Inter-rater reliability (also called inter-observer reliability) traditionally refers to how well two or more raters agree and is derived from the correlation of different raters’ judgments. For the purposes of this paper, inter-rater reliability is a measurement of how well raters agree with a standard, which is more of an assessment of the validity of the responses. The purpose of the study is to determine whether the raters have been consistently trained and are applying that training in a consistent fashion. The analysis is intended to gauge the raters’ observations and reactions resulting from a specific situation. The principles discussed herein would apply to any set of utilization management guidelines.

An inter-rater reliability assessment can be used to measure the level of consistency among a plan or provider group’s utilization management staff and adherence to organizational medical management criteria or standards. Reasons for conducting an inter-rater reliability study within an organization include:

  • Minimizing variation in the application of clinical guidelines;
  • Evaluating staff’s ability to identify potentially avoidable utilization;
  • Evaluating staff’s ability to identify quality-of-care issues;
  • Targeting specific areas most in need of improvement;
  • Targeting staff needing additional training; and
  • Avoiding litigation due to inconsistently applied guidelines.

NCQA requires that health plans develop and implement an inter-rater reliability process for Health Plan Employer Data and Information Set (HEDIS) compliance. NCQA is an independent review organization dedicated to evaluating and reporting on the quality of managed care organizations. HEDIS is a set of standardized performance measures developed by NCQA with assistance from managed care organizations and employers concerned with quality health care. The performance indicators in HEDIS are continually developing, but most involve measuring access to care, health plan service, provider qualifications, activities that assist people to recover from illness, and management of chronic illness. The combination of these measures is intended to provide a tool for performance comparison of different health plans.

An inter-rater reliability study can provide measurement of many of these quality indicators. It can most readily be used to measure access, as NCQA looks for “fair and consistent health plan decisions about medical treatments and services provided to plan members.” It also can be used to measure service, as NCQA looks for “actual improvements that the plan has made in care and service” (“What NCQA Looks for in a Health Plan,” «»). This latter indicator can be measured by reviewing the results of the inter-rater reliability study from year to year.

An inter-rater reliability assessment can be a useful tool for a health plan or provider organization. However, as with any benchmarking exercise, it cannot in and of itself enhance performance. To improve outcomes, the assessment must be followed up by analysis of the results and, most importantly, by action.

Medical management clinical guidelines, whether developed internally or purchased and then adjusted to meet specified objectives and local practice standards, are an extension of the organization’s overall philosophies and goals. It is essential to the future viability of the plan or provider organization, the welfare of members and corporate partners, and the organization’s standing in the community that they be applied appropriately and in a uniform fashion. Periodic benchmarking of clinical guideline application via an inter-rater reliability study is one way for an organization to insure its intentions for utilization management are met.

This paper will examine several elements that health plans or provider groups with utilization management responsibilities should consider when designing and implementing an inter-rater reliability study. The example presented illustrates some aspects of the process. The example, although fairly simple, demonstrates how easily an inter-rater reliability study can be performed. However, inter-rater reliability is a complex concept, and a much more detailed analysis is possible. End users of any inter-rater reliability analysis should be advised of the method and depth of the analysis to avoid confusion or misunderstandings.