-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

2616 Leveraging Generative Artificial Intelligence in Diagnosis of Thrombotic Microangiopathies: Focus on Thrombotic Thrombocytopenic Purpura

Program: Oral and Poster Abstracts
Session: 331. Thrombotic Microangiopathies/Thrombocytopenias: Clinical and Epidemiological: Poster II
Hematology Disease Topics & Pathways:
Bleeding and Clotting, Artificial intelligence (AI), Adult, Clinical Practice (Health Services and Quality), Diseases, Thrombocytopenias, Technology and Procedures, Study Population, Human
Sunday, December 8, 2024, 6:00 PM-8:00 PM

Eunhee Choi, MD1, Jung-Hyun Lee, MD2*, Robert McDougal, PhD3* and William W Lytton, MD2*

1Lincoln Medical & Mental Health Center, New York, NY
2Department of Neurology, State University of New York Downstate Health Sciences University, New York, NY
3Program in Computational Biology and Bioinformatics, Yale University, New haven, CT

Introduction

Thrombotic microangiopathies (TMA), with etiologies ranging from benign to life-threatening, necessitates rapid and accurate diagnosis, particularly for thrombotic thrombocytopenic purpura (TTP), to initiate timely plasmapheresis preventing severe outcomes. Diagnosing TTP is challenging due to overlapping clinical features with other causes of TMA, such as disseminated intravascular coagulation (DIC), immune thrombocytopenic purpura (ITP), and atypical hemolytic uremic syndrome (aHUS), compounded by that specific diagnostic tests such as biopsies or ADAMTS13 activity assays do not result immediately. This study explored GPT-4's capability in suggesting differential diagnoses for TMA patients and identifying a provisional diagnosis of TTP based on clinical presentation and basic diagnostic workup to determine the need for prompt plasmapheresis, assessing its potential as a diagnostic support tool.

Method

We utilized open-access case reports from PubMed Central that provided a comprehensive list of cases with diagnosis of TMA. The exclusion criteria included cases with no access, copyright permission issues, preprints, insufficient description, non-case reports, non-English language, and no established diagnosis for TMA. Each case input including only the history and physical examination (H&P) and basic diagnostic workup excluding the confirmatory diagnosis was presented to GPT-4 in three separate trials and was prompted to provide clinical reasoning that both favored or rejected the diagnosis of TTP, create a top three differential diagnoses list selected from a comprehensive list of TMA diagnoses, and determine the necessity of plasmapheresis. Generated results were subsequently compared with the confirmed case diagnosis and management provided within the case report.

Result

An initial PubMed Central search identified 424 cases; 326 were excluded, resulting in 98 eligible cases. The top three differential diagnoses generated for each case in all three trials exhibited relatively higher F1-scores for ITP, TTP, HUS, and HELLP syndrome, with values of 0.58, 0.59, 0.53, and 0.7, respectively. Other causes of TMA scored below 0.5. Overall performance metrics indicated a specificity of 0.85, sensitivity of 0.80, precision of 0.28, and an F1-score of 0.42. When grouped into TTP versus non-TTP cases, the sensitivity was notably high at 0.98, showing that GPT-4 could adequately rule out TTP, although the specificity was 0.76. When comparing the case diagnosis with the primary diagnosis within the top three differential diagnoses, the overall specificity was 0.96, sensitivity was 0.56, precision was 0.58, and the F1-score was 0.57. The match rate of GPT-4 suggesting plasmapheresis compared to the case report was 76%. In cases confirmed as TTP, GPT-4 demonstrated 100% accuracy in recommending plasmapheresis. For non-TTP cases, GPT-4 showed a 66% match rate compared to the case report's decision to initiate plasmapheresis, indicating a 34% reduction in suggesting plasmapheresis for these cases. Error analysis revealed that errors were primarily due to GPT-4 ignoring pertinent findings, inaccurate knowledge, and confounding symptoms or findings within the case report itself.

Discussion

This study demonstrated that GPT-4 could adequately assist in the diagnosis of TMA and provide suggestions for early management of TTP based on clinical presentation and basic diagnostic workup. GPT-4 appropriately recommended plasmapheresis for TTP cases and showed a comparable performance of that of a clinically commonly used tool in these settings, PLASMIC score. However, in our study, GPT-4 made errors such as ignoring pertinent findings and demonstrating incomplete knowledge, highlighting the need for pretraining and areas to improve regarding diagnosis of TMA. The study suggested that GPT-4 could be integrated as a diagnostic support tool, especially for complex, time-sensitive conditions, while emphasizing that it should complement, not replace, clinical judgment.

Disclosures: No relevant conflicts of interest to declare.

*signifies non-member of ASH