External Validation of EHR-CAT Risk Assessment Model for Cancer Associated Thrombosis in 150 Healthcare Organizations

Li, Ang

Introduction: We previously derived/validated a novel risk assessment model (RAM) to predict cancer associated venous thromboembolism (VTE) using 11 covariates from electronic health records (EHR) (EHR-CAT, https://doi.org/10.1200/JCO.22.01542). However, the initial study had several restrictions that could impair its generalizability. For example, the cohort included only first-line systemic therapy within first year of cancer diagnosis, utilized advanced natural language processing algorithm, and required access to manually curated Cancer Registry. In the current study, we performed a real-world external validation of the EHR-CAT RAM using only structured data from Epic across 150 healthcare organizations in the US.

Methods: Data used in this study came from Epic Cosmos, a dataset created in collaboration with a community of Epic health systems representing more than 259 million patient records from over 1,548 hospitals and 35,400 clinics from all 50 states. The current count values for patients, hospitals, and clinics are available on cosmos.epic.com. For this study, we included 150 healthcare organizations with hematology/oncology departments that contributed >2 years of complete EHR and billing data and >10,000 face-to-face (F2F) oncologic encounters. Incident cancer diagnosis was ascertained using 2+ invasive, solid or hematologic cancer ICD-10-CM codes from billing final and encounter diagnoses in selective F2F encounters 1/2018 to 1/2023. Patients with multiple cancers, isolated/unspecified/secondary codes, or previous cancer diagnosis or systemic therapy were excluded. Among those receiving systemic therapy, patients were further excluded if aged <18 or >100, did not have prior F2F encounter, received anticoagulant (AC) or acute VTE diagnosis in the previous year, or had missing complete blood count (CBC). New VTE outcomes were defined using validated ICD codes (any inpatient or 2+ outpatient) from systemic therapy initiation until last F2F encounter without 6-month gap, death, or 1/2024. Baseline demographics and covariates for EHR-CAT and Khorana score were defined directly from Cosmos. Cumulative incidence was estimated with death as competing risk. C statistic was used to assess discrimination.

Results: Among 148 million eligible patients, 2.1 million had newly diagnosed invasive cancer, and 469,049 received systemic therapy. After excluding ~25% patients with missing CBC and AC/VTE history, 304,780 patients remained in the analytic cohort. The median age was 65, 62% were female, self-reported race was 79% White, 13% Black, and 4% Asian, and 18% lived in rural or micropolitan areas. The most common cancers were breast (40%), prostate (11%), colorectal (10%), and lymphoma (8%). Cancer stage was 20% I-II, 31% III-IV, and 21% unstageable (brain/leukemia). Cancer treatment included 43% endocrine, 37% chemotherapy, and 19% targeted therapy.

Baseline covariates included cancer type (scored 0-3), body mass index ≥35 (16%), leukocyte >11 (14%), hemoglobin <10 (16%), platelet ≥350 (13%), advanced stage (52%), targeted/endocrine therapy (62%), recent hospitalization (21%), recent paralysis/immobilization (2%), remote VTE history (3%), and Asian race (4%) (https://dynamicapp.shinyapps.io/EHR-CAT/).

At 6 months after systemic therapy, there were 7% all-cause mortality and 3.6% new VTE. The 6-month cumulative incidence of VTE was 1.0%, 2.9%, 4.3%, 5.5%, 7.4%, 10.7% for score 0, 1, 2, 3, 4, 5, respectively. The c statistic for EHR-CAT was 0.72 (vs. 0.62 for Khorana score). The RAM performed well in each racial subgroup. Notably, model simplification by excluding CBC did not alter its performance. Multivariable analysis showed that CBC contributed only 10-20% additional risk, compared with 40-300% from clinical covariates.

Conclusion: Using Epic Cosmos, we identified 2.1 million patients with newly diagnosed cancer from 150 organizations, which is ~23% of all incident cancers in the US over 5 years with a similar racial distribution to the US census. Despite its simplicity, the EHR-CAT RAM demonstrated a robust and modest performance in this large external independent validation. Most importantly, all covariates and outcomes were derived directly from Epic; therefore, this RAM could be integrated to most healthcare systems that utilize EHR to aid patient selection for clinical trials and thromboprophylaxis.

812 External Validation of EHR-CAT Risk Assessment Model for Cancer Associated Thrombosis in 150 Healthcare Organizations