To reduce long-term health care costs, health plans are pursuing various strategies to improve patient health, including disease management programs. Although common conditions such as asthma and diabetes are typical targets for these programs, rare diseases, which collectively affect more than 25 million patients in the United States each year (HHS 2010), represent potential targets for disease management programs.
Health plans and payers with access to large volumes of patient population data are in a unique position to provide key insights into improving identification and treatment of patients with rare diseases. The diagnosis of rare diseases, in comparison with common diseases, typically takes longer, requires more hospital visits, involves multiple primary care providers and specialists, and entails an assortment of diagnostic tests and procedures (Schieppati 2008, de Vrueh 2013). Moreover, rare diseases, particularly for disorders with variable severity and appearance of associated manifestations, require a high frequency of health care resource utilization, with uncertainty about which procedures are necessary for either diagnosis or treatment (Schieppati 2008, de Vrueh 2013). One such disorder is tuberous sclerosis complex (TSC), a rare genetic disorder characterized by cognitive deficits; behavioral disorders; a high rate of intractable seizures; and benign tumors in the brain, kidneys, heart, eyes, lungs, and skin (Curatolo 2008, Crino 2006). The incidence of TSC is 1 in 6,000 live births, and between 25,000 and 50,000 people in the United States and 1 to 2 million worldwide live with the disease (NIH).
- benign renal tumors
- Subependymal giant cell astrocytomas (SEGAs)
- benign tumor in the ventricles of the brain
- Cortical tubers
- malformed tissue in the gray/white matter interface
- Subependymal nodules
- small lesions found in the lining of the brain ventricles
A symptom of TSC. Facial angiofibromas in a butterfly pattern are a common symptom. (Wikimedia Commons)
Diagnosis and management of TSC is challenging because TSC affects every patient differently (Wataya-Kaneda 2013). TSC affects all organ systems but with diverse and variable manifestations and severity across pediatric and adult patients (Krueger 2013b). Virtually all patients with TSC have skin abnormalities that vary with age and appearance between individuals (Northrup 1999). Brain tumors associated with TSC include subependymal giant cell astrocytomas (SEGAs), cortical tubers, and subependymal nodules (Curatolo 2008). Epilepsy prevalence in TSC is 85%, often manifesting early in life and resistant to therapy (Chu-Shore 2010). TSC-associated neuropsychiatric disorders include autism spectrum disorder (Spurling Jeste 2014, de Vries 2007), attention-deficit/hyperactivity disorder (de Vries 2007, D’Agati 2009), and mild to profound intellectual disability (Joinson 2003). An additional common condition associated with TSC in female patients is the formation of cysts in lymphatic vessels in the lungs and abdomen (lymphangioleiomyomatosis) (Crino 2006).
None of the aforementioned clinical features in TSC are pathognomonic. Current diagnostic criteria for TSC (Table 1) require the presence of any of two major features or identification of pathogenic mutations in TSC1 or TSC2 as sufficient for TSC diagnosis (Northrup 2013). Alternatively, the presence of one major feature plus two minor features is also sufficient for diagnosis. Once diagnosed with TSC, consensus recommendations call for abdominal imaging, annual assessment of kidney function (Krueger 2013b, Rouviere 2013), and evaluation for neuropsychiatric disorders (Krueger 2013b). Brain magnetic resonance imaging is recommended every 1 to 3 years to assess SEGA development (Krueger 2013a).
Diagnostic criteria for tuberous sclerosis complex
Multiple retinal nodular hamartomas
Subependymal giant cell astrocytoma
Confetti skin lesions
Dental enamel pits
Multiple renal cysts
Retinal achromatic patch
*Combination of renal angiomyolipomas and lymphangioleiomyomatosis with no other features does not meet the criteria for a definite diagnosis.
With such an assortment of potential disease manifestations requiring assessment, treatment, and monitoring, care of TSC patients is associated with high health care resource utilization, resulting in substantial economic costs (Lennert 2013, Rentz 2015). Surgical resection of SEGAs and angiomyolipomas, in particular, are associated with significant direct medical costs and high utilization of medical resources (Vekeman 2015, Sun 2015). Additionally, disease burden for individuals with TSC affects their health-related quality of life, particularly around the mental health domain, with TSC patients reporting a significantly greater mental health burden than cancer patients and more depressive symptoms compared with nonpsychiatric community controls (Rentz 2015).
The growing availability of disease- and treatment-related patient data is a resource that health plans can leverage to improve understanding and management of rare diseases. Rare disease surveillance can be difficult and expensive. Population surveys often lack a sufficient number of observations, patient registries can lack generalizability to the overall population, and active surveillance requires medical chart review, which is time consuming and costly.
In contrast, passive surveillance through claims data offers several benefits. Claims data are inexpensive, readily available, and provide insight into mortality rates, comorbidities, access to health care services, and costs of care. At a population level, these data can be used to guide the provision of health services and evaluate the cost-effectiveness of medical care and interventions.
Using patient health care claims data from large digital databases, this retrospective case-controlled study sought to identify key patient and treatment factors associated with TSC. Identifying patients with TSC at an early stage has important clinical implications for prevention of further complications and receipt of appropriate clinical care.
Patients were identified for inclusion in the study during the identification period from Jan. 1, 2000, through Dec. 31, 2011. Of the more than 75 million commercial and Medicare Advantage enrollees in the Optum Research Database (ORD) and the Impact National Benchmark database (Impact) during the study period, 5,525 were identified with at least one medical claim of a TSC diagnosis (International Classification of Disease, 9th revision diagnosis code 759.5x). Inclusion criteria for the TSC sample consisted of TSC diagnosis occurring from January 2000 to December 2011, with the first claim with a TSC diagnosis considered the index date. Patients had to have 12 months of continuous enrollment with medical and pharmacy benefits prior to the index date (baseline period). The inclusion criteria for control patients were medical coverage during this same period and no diagnosis of TSC. The index date was randomly assigned for the control group during the enrollment period. Patients were excluded if they had a TSC diagnosis during the baseline period.
After application of the above inclusion criteria to the initial 5,525 patients with TSC identified in the databases, a total of 2,498 patients were retained in the final TSC cohort. Of these, the subset of 1,685 patients identified in the ORD was used for model development. The control cohort was paired with the patients with TSC on length of enrollment post index (±90 days) and year of index date (±2 years). The match yielded 1,472 patients with TSC and four controls per patient with TSC, which resulted in the retention of 5,888 patients in the control cohort. TSC and control patients were observed 12 months prior to the date of diagnosis of TSC and for varying lengths of time following the date of diagnosis. Patients with TSC and matched control patients from Impact were used to assess the applicability of the model developed in the ORD to an independent claims database.
ORD and Impact contain medical and pharmacy claims linked to enrollment information from health plans serving members across the United States. The medical claims capture diagnoses and procedures from International Classification of Disease, 9th revision, Clinical Modification codes; Healthcare Common Procedure Coding System procedure codes; and Current Procedural Technology procedure codes. Pharmacy claims include National Drug Code, quantity dispensed, drug strength, and days’ supply. No identifiable protected health information was extracted or accessed during the course of the study.
Pursuant to the Health Insurance Portability and Accountability Act, the use of de-identified data does not require institutional review board approval or waiver of authorization.
The full ORD sample was randomly split in half to create development and validation samples (TSC, n=736; control, n=2,944, for each sample). Logistic regression coefficients were calculated from the development sample, and the validation sample was held out to test internal validation of the model. The dependent variable for this analysis was TSC diagnosis (yes=1, no=0). The analysis evaluated 600 potential independent variables for inclusion in the model. The first stage of model development included univariate assessment of dichotomous variables and graphical assessment of continuous variables to determine the most appropriate functional form of each measure. Continuous variables were capped at the 99th percentile to reduce skewing from outliers. Examples of these include visit counts, length of inpatient hospitalization, and health care-related costs.
The next stage of model development involved grouping similar types of variables into blocks, such as demographic characteristics, general comorbidity indexes, health care utilization and costs, diagnoses (e.g., seizures, cognitive disorder, angiomyolipomas), procedures (e.g., imaging, genetic testing, electrocardiogram), and medications. Blocks were manually added to or removed from the model in a stepwise fashion based on Wald chi-square values while maximizing the probability of concordance between predicted and actual outcomes. At each step, variables within each block were added to the existing model one at a time; highly correlated variables were removed to preserve interpretability, and variables that did not improve predictability were dropped. Only the most predictive variables among partially redundant variables in a block were retained. While building the model, predictability and goodness of fit were assessed with c-statistics, and comparisons between models were made using likelihood ratio tests for nested models and Akaike information criterion for non-nested models (measure of the relative quality of statistical models). Each factor received a numeric value based on its relative strength of association with TSC diagnosis (i.e., odds ratio). These factors collectively equate to a predictive value (i.e., probability) that can be calculated based on the characteristics of a patient. To examine the sensitivity and specificity of the model, a receiver operating characteristic (ROC) curve was generated based on applying the model to the patient data in the internal validation sample. The area under the ROC curve (AUC) is a measure of test accuracy, where a value of 1 indicates that the model perfectly discriminates between a patient with TSC and one without and a value of 0.5 indicates a model that performs no better than chance (Agresti 2002). After deciding which covariates to include in the model, the patients in the Impact sample were scored using the estimates from the ORD development sample. This final test was to measure the external validity of the model and determine whether the model was biased toward the ORD database used in development.
Statistical analyses were performed using SAS software Version 9.2 of the SAS System, W32 VSPRO Platform, copyright 2002–2008 (SAS Institute Inc., Cary, N.C.).
Data acquisition and study sample description
Data from 7,360 patients (TSC, n=1,472; control, n=5,888) were accessed from the ORD. Each cohort had similar mean age (36 years for TSC and 37 years for control) and gender proportions (Table 2). Within the ORD sample, we analyzed 600 potential factors that were recorded in the year prior to and for varying periods following diagnosis of TSC. These factors were categorized and included demographics (age, gender, and region), general comorbidity indices, health care utilization and costs, diagnoses, procedures, and medications. Patient demographics and results of key descriptive factors are listed in Table 2.
Patient demographics (ORD sample)
|Age, mean (SD), years
|Age categories, n (%)
|Gender, n (%)
|Insurance type, n (%)
|Health plan region, n (%)
|ORD=Optum Research database, TSC=tuberous sclerosis complex.
The most frequent diagnoses in the TSC group identified in the ORD sample and control group, respectively, were skin disorders (41.6% and 17.9%; P<.001), kidney and urinary system disorders (21.4% and 12.3%; P<.001), depression (17.3% and 10.7%; P<.001), seizure disorders (16.8% and 1.7%; P<.001), nausea and vomiting (15.9% and 8.7%; P<.001), anxiety (14.4% and 9.9%; P<.001), sleep disturbances (13.7% and 9.2%; P<.001), and cardiac dysrhythmias and rhabdomyomas (12.2% and 6.2%; P<.001) (Figure 1).
Distribution of top diagnoses associated with patient population with TSC (ORD sample)
% of patients
ORD=Optum Research database, TSC=tuberous sclerosis complex.
TSC risk model performance
In the logistic regression model using the development sample, the top covariate for patients with TSC was the presence of an evaluation and management procedure code (odds ratio=11.4; P<.001) (Table 3). Compared with control patients, seizure disorders (odds ratio=5.9; P<.001) and angiomyolipomas (odds ratio=5.8; P<.001) were the conditions most strongly associated with patients with TSC, followed by skin disorders (odds ratio=3.0; P<.001), renal failure (odds ratio=1.6; P=.03), and cognitive disorders (odds ratio=1.6; P=.02) (Table 3).
Patient and treatment factors associated with TSC (ORD)
|Age categories (reference: 40+)
|Number of days post index
|Other connective tissue disease
|Brain–CT scan, MRI, MRA, ultrasound
|Cardiac: radiograph, CT scan, MRI, ultrasound, ECHO, EKG
|Influenza vaccine, ≥3 years, intramuscular
|Office/outpatient visit, estimated
|≥6 ambulatory visits PPPY
|Evaluation and management
|Number of specialist visits PPPY
|CT=computed axial tomography, ECHO=echocardiogram, EKG=electrocardiogram, MRA=magnetic resonance angiogram, MRI=magnetic resonance imaging, ORD=Optum Research database, PPPY=per patient per year, TSC=tuberous sclerosis complex
To assess the performance of the model, an ROC was constructed to test the model against the internal validation sample data (TSC, n=736; control, n=2,944). This test resulted in an AUC of 0.77 (Figure 2). Classifying patients with a predicted probability of greater than or equal to 80% yielded a sensitivity of 8.8% and a specificity of 99.4%, while classifying patients with a predicted probability of 34% yielded a sensitivity of 39.4% and a specificity of 91.9% (these are further explained in the Discussion section). The model was externally validated with Impact, which resulted in an AUC of 0.75, signifying that this model performed similarly in an insured population not used in model development.
ROC analysis evaluation of TSC model using the ORD internal validation sample
ORD=Optum Research database, ROC=receiver operating characteristic, TSC=tuberous sclerosis complex
This study sought to identify patient and treatment factors surrounding the TSC diagnosis that could be informative for health plans managing patients with this disease. Analysis of 600 potential predictors of TSC created from information included in health insurance claims revealed significant associations between TSC diagnosis and benign neoplasms of the skin, kidney and urinary system disorders, depression, seizure disorders, nausea and vomiting, anxiety, sleep disturbances, and cardiac dysrhythmias. Seizures and angiomyolipomas were the clinical findings most strongly associated with patients with TSC. The strongest covariate was the presence of a code indicative of an office visit. Among TSC patients, 100% had a visit compared with 90% of the control patients (with visits defined based on codes for evaluation and management). Overall, these results are consistent with clinical diagnosis criteria (Northrup 1999, Northrup 2013) and match historical data (Wataya-Kaneda 2013).
Several measures of model performance are reported: predictive probability (AUC), sensitivity, and specificity. The AUC, a measure of the overall performance of the measure, obtained in this study was similar to values obtained for other risk–score models commonly used in practice in other diseases. Notably, the Framingham (cardiovascular) risk score and several related cardiovascular risk scores have reported external validation AUCs, ranging from 0.61 to 0.88 (Bitton 2010). By comparison, the TSC model AUCs of 0.77 (internal) and 0.75 (external) fall in the same range of these scores.
The sensitivity of the model, with a chosen threshold, estimates the proportion of true positive diagnoses to all patients with TSC within the sample population. As an example, at a threshold of 80%, the validation sample produced a sensitivity value of 8.8%, indicating that the model is expected to correctly identify 8.8% of patients with TSC in this sample at this cutpoint. The specificity measure gives the proportion of true negative patients relative to all patients without TSC. This model has a specificity value of 99.4% when a threshold of 80% is used, indicating it correctly specified 99.4% of control patients as not at risk for TSC. Selecting a lower threshold value results in a higher sensitivity at the cost of a lower specificity. At a threshold level of 20%, the TSC model has sensitivity of 64.8% and specificity of 77.1%, results that are similar to the performance of the Framingham risk score at the same threshold (74.3% sensitivity and 59.4% specificity) (Brindle 2005).
We present classification statistics using a threshold of 80% to indicate that anyone with a predicted probability of 80% or greater is at higher risk of having the disease. However, owing to very low prevalence of TSC, this calculation is more complicated and requires prevalence adjustment. For example, if we apply it to a population of 100,000 people, where the prevalence of TSC is about 0.2%, this model would identify 1,893 patients as high risk; only 27 of those would be true cases (1.4%). Because this condition is so rare, adjustments required to account for the low prevalence result in small numbers of patients identified within an overall population of enrollees. It is important to note that this model is not designed to have definitive predictive value; instead, it is intended to provide a relative TSC risk score that may enable health plans to further investigate TSC as a potential diagnosis or target patients for further management. This may enable a health plan to target a set of patients with complex and expensive medical care needs for improved health care management.
The conclusions drawn during development and the practical utility of the model are limited by the nature and quality of the patient data. Interpretation of descriptive results, especially regarding binary events, should be made with caution because of the varying length of postdiagnosis follow-up for each patient. This study used data from ORD from patient medical claims during a specific period of time, which may or may not accurately or completely reflect the true status or symptoms of the patient. The data are also limited by the resources available to the treating physician as well as the completeness of data, which were provided on claims for billing purposes rather than clinical use.
Potential applications and value
The model developed here could potentially serve as a quality of care tool for health plan managers and accountable care organizations. Population segmentation based on health indicators can enable the arrangement of commonly needed supports and services to meet their expected needs (Lynn 2007).
With the TSC model described here, three potential segments are patients diagnosed with TSC, patients with a high risk of TSC, and patients with a low risk of TSC. These segments could be targeted for population screening initiatives that seek to keep healthy patients healthy, reduce health risk, and ensure appropriate care for the patients with TSC (Meiris 2012). Information exchange and health coaching facilitated by accountable care organizations are prime examples of how the data developed here could be applied to improving health care. An example of information exchange in TSC care is the TSC-associated neuropsychiatric disorder checklist, which is a screening questionnaire completed by the clinician in collaboration with the patient or caregiver to address the often unmet need of psychiatric care in this population (Leclezio 2015).
In general, medical care is delivered and managed acutely in-house based on physician discretion and medical guidelines. This approach is limited by the resources available to the physician or hospital and may be subject to observational bias or based on outdated information (Feuerstein 2014). The availability of large amounts of digital patient data is a growing resource that can be used to develop disease risk models that can support health plans to improve identification of patients with diseases and to implement data-driven population management plans.
Analysis of large patient-data sets helps to develop a more comprehensive understanding of symptoms, treatments, and procedures associated with disease management. Where disease progression leads to worse outcomes, early and accurate diagnosis is crucial for optimizing patient health. Improved understanding of risk factors and treatment options for patients with TSC (or potential patients with TSC) will support health plans in improving the health of their covered populations. Once implemented, this approach and data analysis could support improved patient outcomes and quality of life and reduce patient economic burden.
More generally, this data-driven approach provides a case example of how to use available patient data to improve the ability of health insurers and others that manage care at a population level to identify patient populations that may benefit from targeted interventions.
Nicole M. Engel-Nitz, PhD
Associate Director, Health Economics and Outcomes Research
11000 Optum Circle
Eden Prairie, MN 55344
Fax: (952) 205-4782
Acknowledgments: Medical writing and editorial assistance provided by Wyatt Potter, PhD, and Alan Saltzman, PhD (Fishawack Communications, Conshohocken, Pa).
Agresti A. Summarizing predictive power: classification tables and ROC curves. In: Agresti A. Categorical Data Analysis, 2nd ed. Hoboken, N.J.: John Wiley & Sons, 2002:228–230.
Bitton A, Gaziano TA. The Framingham Heart Study’s impact on global risk assessment. Prog Cardiovasc Dis. 2010;53:68–78.
Brindle PM, McConnachie A, Upton MN, et al. The accuracy of the Framingham risk-score in different socioeconomic groups: a prospective study. Br J Gen Pract. 2005;55:838–845.
Chu-Shore CJ, Major P, Camposano S, et al. The natural history of epilepsy in tuberous sclerosis complex. Epilepsia. 2010;51:1236–1241.
Crino PB, Nathanson KL, Henske EP. The tuberous sclerosis complex. N Engl J Med. 2006;355:1345–1356.
Curatolo P, Bombardieri R, Jozwiak S. Tuberous sclerosis. Lancet. 2008;372:657–668.
D’Agati E, Moavero R, Cerminara C, Curatolo P. Attention-deficit hyperactivity disorder (ADHD) and tuberous sclerosis complex. J Child Neurol. 2009;24:1282–1287.
de Vries PJ, Hunt A, Bolton PF. The psychopathologies of children and adolescents with tuberous sclerosis complex (TSC): a postal survey of UK families. Eur Child Adolesc Psychiatry. 2007;16:16–24.
de Vrueh R, Baekelandt ERF, de Haan JMH. Priority medicines for Europe and the world: “A public health approach to innovation.” Update on 2004 background paper. Background paper 6.19 rare diseases. Geneva, Switzerland: World Health Organization, 2013. http://www.who.int/medicines/areas/priority_medicines/BP6_19Rare.pdf. Accessed July 8, 2017.
Feuerstein JD, Akbari M, Gifford AE, et al. Systematic analysis underlying the quality of the scientific evidence and conflicts of interest in interventional medicine subspecialty guidelines. Mayo Clin Proc. 2014;89:16–24.
HHS (U.S. Department of Health & Human Services). Fact Sheet-Rare disease clinical research network Oct 2010 update. https://report.nih.gov/nihfactsheets/ViewFactSheet.aspx?csid=126. Accessed July 8, 2017.
Joinson C, O’Callaghan FJ, Osborne JP, et al. Learning disability and epilepsy in an epidemiological sample of individuals with tuberous sclerosis complex. Psychol Med. 2003;33:335–444.
Krueger DA. Management of CNS-related disease manifestations in patients with tuberous sclerosis complex. Curr Treat Options Neurol. 2013a;15:618–633.
Krueger DA, Northrup H. Tuberous sclerosis complex surveillance and management: recommendations of the 2012 International Tuberous Sclerosis Complex Consensus Conference. Pediatr Neurol. 2013b;49:255–265.
Leclezio L, Jansen A, Whittemore VH, de Vries PJ. Pilot validation of the tuberous sclerosis-associated neuropsychiatric disorders (TAND) checklist. Pediatr Neurol. 2015;52:16–24.
Lennert B, Farrelly E, Sacco P, et al. Resource utilization in children with tuberous sclerosis complex and associated seizures: a retrospective chart review study. J Child Neurol. 2013;28:461–469.
Lynn J, Straube BM, Bell KM, et al. Using population segmentation to provide better health care for all: the “Bridges to Health” model. Milbank Q. 2007;85:185–208.
Meiris DC. Insights from the 12th population health management and care coordination colloquium. Popul Health Manag. 2012;15:127–128.
NIH (National Institutes of Health). Tuberous sclerosis fact sheet. https://www.ninds.nih.gov/Disorders/Patient-Caregiver-Education/Fact-Sheets/Tuberous-Sclerosis-Fact-Sheet. Accessed July 8, 2017.
Northrup H, Koenig M, Pearson DA, Au KS. Tuberous sclerosis complex. In: Pagon RA, Adam MP, Ardinger HH, et al, eds. GeneReviews [Internet]. Seattle: University of Washington, 1999.
Northrup H, Krueger DA, Roberds S, et al. Tuberous sclerosis complex diagnostic criteria update: recommendations of the 2012 International Tuberous Sclerosis Complex Consensus Conference. Pediatr Neurol. 2013;49:243–254.
Rentz AM, Skalicky AM, Liu Z, et al. Tuberous sclerosis complex: a survey of health care resource use and health burden. Pediatr Neurol. 2015;52:435–441.
Rouviere O, Nivet H, Grenier N, et al. Kidney damage due to tuberous sclerosis complex: management recommendations. Diagn Interv Imaging. 2013;94:225–237.
Schieppati A, Henter JI, Daina E, Aperia A. Why rare diseases are an important medical and social issue. Lancet. 2008;371:2039–2041.
Spurling Jeste S, Wu JY, Senturk D, et al. Early developmental trajectories associated with ASD in infants with tuberous sclerosis complex. Neurology. 2014;83:160–168.
Sun P, Liu Z, Krueger D, Kohrman M. Direct medical costs for patients with tuberous sclerosis complex and surgical resection of subependymal giant cell astrocytoma: a US national cohort study. J Med Econ. 2015;18:349–356.
Vekeman F, Magestro M, Karner P, et al. Kidney involvement in tuberous sclerosis complex: the impact on healthcare resource use and costs. J Med Econ. 2015;26:1–11.
Wataya-Kaneda M, Tanaka M, Hamasaki T, Katayama I. Trends in the prevalence of tuberous sclerosis complex manifestations: an epidemiological study of 166 Japanese patients. PLoS One. 2013;8:e63910.