Session: 803. Emerging Tools, Techniques and Artificial Intelligence in Hematology: Poster III
Hematology Disease Topics & Pathways:
Research, Bleeding and Clotting, artificial intelligence (AI), Translational Research, thromboembolism, Diseases, Technology and Procedures, machine learning
Methods: 2542 patients diagnosed with VTE were enrolled in a prospective cohort study over 8 years. In addition to recording their clinical information at baseline, 6-month follow-up interviews were conducted using a standard script to monitor bleeding status and record clinical information. Major bleeding was defined by the International Society on Thrombosis and Haemostasis, with suspected bleeding events classified by an independent adjudication committee. Overall, 118 patients had major bleeding — a 4.6% incidence rate. The median and mode of the clinical variables were used to impute missing numerical and categorical values, respectively, and for patients who had no follow-up information, for whom bleeding occurred before the first follow-up, an artificial follow-up data point was generated from their corresponding baseline data. Thereafter, the data was divided into two stratified sets: 70% for training, and 30% for testing. Five supervised neural network-based machine learning models with different architectures were trained on the baseline dataset, or the follow-up dataset, or both to predict major bleeding. After training, these machine learning models were tested on the testing set and compared to the conventional clinical models, modified to make them compatible with the available predictor variables in our dataset, including the CHAP, the HAS-BLED, the VTE-BLEED, the RIETE, the ACCP, and the OBRI, which only use the baseline information.
Results: Overall, the models that used the follow-up information had a higher area under the Receiver Operating Curve (AUROC) or c-statistic compared to the other models that only relied on the baseline dataset. In particular, the LSTM RNN model was able to achieve AUROC of 81.3% that is more than 10% higher compared to the best performing clinical model. We discovered that the LSTM RNN model mostly relied on features such as number of concomitant medications, years since baseline visit, use of specific antibiotics or antiplatelet agents, and presence of new hypertension to predict bleeding from the follow-up dataset. Furthermore, half of the bleeding events occurred within the first year after patients’ baseline visits — a trend reflected in the predictions made by LSTM RNN model. Finally, the models that used both the baseline and the follow-up datasets showed different results depending on their architectures; that is, the simpler ensemble model achieved AUROC of 82.5% while the more complex model had AUROC of 70.8% due to overfitting.
Conclusion: We have shown that using time series follow-up data can improve bleeding risk prediction in patients with VTE who are on extended anticoagulant therapy compared to just using the baseline data, and clinicians might benefit from using such an approach. Furthermore, our results indicate that LSTM RNN is a suitable architecture to model routine clinical follow-up data. Finally, we believe using time series data could improve the performance of the other clinical models that are currently based on one-time baseline measurements.
Disclosures: No relevant conflicts of interest to declare.