-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

3504 Utilization of Natural Language Processing in Venous Thromboembolism Identification

Program: Oral and Poster Abstracts
Session: 901. Health Services and Quality—Non-Malignant Conditions: Poster II
Hematology Disease Topics & Pathways:
Research, adult, Clinical Practice (Health Services and Quality), Clinical Research, Technology and Procedures, Study Population, Human, natural language processing
Sunday, December 11, 2022, 6:00 PM-8:00 PM

Jonathan Avery, MD1*, Kylee L Martens, MD2*, Daniel Nguyen, MD, PhD3, Ryan Basom4*, Stephanie Lee, MD, MPH5, David A. Garcia, MD6, Cristhiam Mauricio Rojas Hernandez, MD7 and Ang Li, MD, MS8

1University of Washington School of Medicine, Seattle, WA
2Division of Hematology and Medical Oncology, Oregon Health and Science University, Portland, OR
3Department of Medicine, The University of Texas Health Science Center at Houston, Houston, TX
4Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, WA
5Fred Hutchinson Cancer Research Center, Seattle, WA
6Division of Hematology, University of Washington, Seattle, WA
7Section of Benign Hematology, The University of Texas MD Anderson Cancer Center, Houston, TX
8Hematology-Oncology, Baylor College of Medicine, Houston, TX

Introduction: Capturing venous thromboembolism (VTE) outcomes using only International Classification of Disease (ICD) codes can lead to misclassification of events and to inaccurate conclusions. Manual chart review, the gold standard, is labor intensive and not always feasible. The current study aims to validate an improved computable VTE phenotype algorithm combining ICD with natural language processing (NLP) in patients undergoing hematopoietic cell transplantation (HCT).

Methods: All patients undergoing first allogeneic HCT at Fred Hutchinson Cancer Center (FHCC) from 2006-2019 were included in the current study. To capture as many as possible VTE events, we used a sensitive screening method within one year before and after the transplant date that encompassed 1) all patients with ICD-9 and ICD-10 codes for acute, chronic, or historical pulmonary embolism (PE), deep venous thrombosis (DVT), and phlebitis and thrombophlebitis, and 2) all patients with at least 1 radiology report with a pertinent VTE-related keyword from venous doppler ultrasound, contrast computed tomography, or ventilation perfusion scans. All patients from this screened subset were then reviewed by chart abstractors (JA, KM) to establish the gold standard of incident VTE events, which was defined as the new onset of radiologically confirmed PE, lower extremity DVT, or upper extremity/catheter related DVT within 1-year post-transplant.

We then tested the performance of the acute VTE ICD-9/10 codes (selective codes from 415, 451, 453; I26, I80, I82) from inpatient or outpatient encounters from stem cell infusion until 1-year post-transplant. We also tested the utility of a NLP algorithm, a method that has been previously validated in a separate cancer cohort using unstructured radiology impressions (PMID: 35647478). Finally, we compared the performance of combining these two algorithms against the gold standard to report positive predictive value (PPV) and sensitivity (Sn).

Results: Among 2,879 patients who underwent allogeneic HCT over 15 years, 740 (26%) met study inclusion criteria. Based on the gold standard of detailed medical record review, 275 (10%) were found to have a radiologically confirmed VTE event within 1-year post-transplant (Figure 1). A further 389 (14%) historical VTE events were confirmed before transplant. The acute VTE ICD-9/10 codes identified 339 patients and the NLP algorithm predicted 245 patients to have VTE (including 155 overlap).

The ICD codes alone for acute VTE had an estimated Sn of 73% and PPV of 59%. The NLP radiology algorithm alone for VTE had an estimated Sn of 73% and PPV of 82%. Approximately 3 in 10 VTE events were missed in each algorithm. However, the combination of ICD or NLP identified 245/275 of all VTE events (Sn 89%). The PPVs for ICD+/NLP+, ICD-/NLP+, and ICD+/NLP- were 89%, 64%, and 27%, respectively (Table 1). In summary, those with concordant ICD/NLP prediction had excellent PPVs, and approximately 8% of patients with discordant ICD/NLP (n=234/2,879) would require additional chart review to achieve a final PPV >90% and Sn >90%.

Conclusion: In the current study, we found that the sensitivity for either the acute ICD codes or the NLP algorithm alone was sub-optimal (missing 3 in 10), and a combined screen should be considered (missing 1 in 10). The use of ICD-9/10 codes alone for new VTE had poor accuracy in our cohort (PPV of 59%), suggesting that additional features are needed, such as concurrent anticoagulation medications. In contrast, the NLP algorithm was validated with high PPV 82% (89% when combined with positive acute ICD screen) in the current cohort and may not require additional confirmation, though caution should be taken for its usage in other studies without dedicated follow-up where radiology reports are captured and stored in one unified healthcare system. One limitation of the study is the lack of review of patients initially screened negative by either ICD codes or radiology keyword searches. While the initial screen was designed to be highly sensitive, we may have missed small number of true VTE events and the reported Sn in this study represents the best-case scenario. In conclusion, while computable phenotype algorithms represent a promising future for the identification of VTE, a hybrid approach involving manual chart review (for only cases where the ICD and NLP screens disagree) may provide the highest yield and help minimize the labor intensive manual review.

Disclosures: Lee: Amgen: Research Funding; AstraZeneca: Research Funding; Equillium: Consultancy, Honoraria; Incyte: Research Funding; Kadmon: Consultancy, Honoraria, Research Funding; Mallinckrodt: Consultancy, Honoraria; National Marrow Donor Program: Membership on an entity's Board of Directors or advisory committees; Novartis: Membership on an entity's Board of Directors or advisory committees; Pfizer: Research Funding; Syndax: Research Funding. Rojas Hernandez: ANTHOS Therapeutics: Research Funding; ASPEN Pharmaceuticals: Research Funding; Daichii Sankyo: Research Funding.

*signifies non-member of ASH