Type: Oral
Session: 901. Health Services and Quality Improvement: Non-Malignant Conditions Excluding Hemoglobinopathies: Optimizing Classical Hematology Care
Hematology Disease Topics & Pathways:
Research, Bleeding and Clotting, Artificial intelligence (AI), Adult, Translational Research, Epidemiology, Clinical Research, Thromboembolism, Diseases, Technology and Procedures, Study Population, Human, Natural language processing
Methods: In derivation/internal validation (Harris Health System [HHS]) and external validation (Veterans Affairs [VA]) cohorts, we identified clinical progress notes, discharge summaries, and radiology reports from patients with active cancer receiving systemic therapy. We then preprocessed the notes to identify sections with high clinical value (e.g. history of present illness, assessment, plan, impression, hospital course) and isolated sentences containing VTE keywords. VTE was defined as a newly diagnosed and acute PE, LE-DVT, or UE-DVT. Unusual thrombosis like splanchnic vein (mostly tumor thrombi) or ambiguous events (chronic VTE or clinically suspected without radiologic confirmation) were classified as negative. To determine the gold standard, multiple medically trained and blinded annotators reviewed patient notes in our NLP web interface.
The derivation cohort consisted of 700 patients from HHS with gold standard multi-class VTE labels (~1,000 positive and ~15,000 negative sentences). We finetuned the Bio_ClinicalBERT transformer model with a learning rate of 2e-5 and max 30 epochs (tuned to positive label training metrics) on AWS EC2 g4dn.12xlarge server. We then applied the finetuned model (now called VTE-BERT) to unstructured clinical notes from two previously labeled independent datasets of 458 cancer patients at HHS (internal validation; 97 confirmed VTE) and 764 cancer patients at VA (external validation; 489 confirmed VTE). No additional transferred or federated learning was performed. Accuracy, positive predictive value (PPV or precision) and sensitivity (recall) was estimated.
Results: In the HHS validation cohort, VTE-BERT achieved accuracy, precision, and recall of 96%, 90%, and 94%, respectively (97 true positives, 12 false positives, 6 false negatives in 458 patients). In the VA external validation cohort, the same model achieved accuracy, precision, and recall of 91%, 92%, and 94% (462 true positives, 42 false positives, 27 false negatives in 764 patients). The model successfully predicted most detractors like tumor thrombi and septic thrombi as negative. False positives were driven by a combination of historical VTE events mistakenly labeled as new and arterial thrombosis events. When excluding patients with recent VTE diagnosis (as predicted by positive VTE-BERT before index date), the precision improved further to 98% in HHS and 93% in VA. Finally, the updated model was able to accurately differentiate among the different types of VTE events.
Conclusion: The updated multi-class VTE-BERT LLM performed well in two different healthcare systems with vastly different clinical note types and formatting, demonstrating the generalizability of a well-trained LLM transformer. This represents one of the first efforts to apply a fine-tuned LLM NLP algorithm on raw clinical notes from an external independent dataset. While the model training took 1+ year effort in human annotation and model tuning, its application was straightforward. With the ease of access to unstructured clinical notes in most healthcare systems that utilize electronic health records, the updated VTE-BERT model, along with our annotation pipeline designed for physicians, can greatly reduce the cost and time associated with annotation and improve thrombosis research.
Disclosures: La: Merck: Research Funding.