Type: Oral
Session: 904. Outcomes Research – Non-Malignant Conditions: Blending the Old With the New: Traditional and Innovative Approaches to Determining Outcomes in Patients With Sickle Cell Disease
Hematology Disease Topics & Pathways:
Research, Clinical Practice (Health Services and Quality), health outcomes research, Clinical Research
Methods
We applied ML methods to clinical parameters and both categorical and time-to-event outcomes in a de-identified CIBMTR dataset of patients undergoing HCT. A supervised random forest model was created with baseline covariates as independent variables. Model selection was performed by both the clinician and data scientists to create a model using prognostically relevant variables. Since the number and percentage of negative outcomes in HCT for SCD is smaller than the positive outcomes, the model is imbalanced and biased towards predicting positive outcomes. To counter the imbalance, we constructed a training dataset taking each outcome variable of interest, and included randomly sampled positive outcomes, typically 1.5-3 times the total instances of the variable of interest. We ran the test dataset, and used a random forest on 20 such trials. To account for the effect of the undersampling, we propose a positive threshold 𝜹, and assigned a final prediction of a negative outcome if the average sum for an element is greater than delta. We performed a 2 times Repeated 10 Fold Cross Validation, to demonstrate our model’s versatility and response to unknown data. The accuracy score may be misleading as it may result from the model being able to correctly predict the numerical majority of the positive outcomes, and does not indicate the ability to detect negative outcomes. We therefore estimated balanced accuracy which is the arithmetic mean of sensitivity and specificity. Thus, a higher balanced accuracy results only when both the negative and positive outcomes are predicted with a high accuracy. We also measured Area under the Receiver Operator Characteristic Curve (ROC AUC), a measure of the ability of a binary classifier to distinguish between classes. We define model confidence as the average probability across 20 trials, as well as our confidence in the model. We describe predictive probability percentage as : model confidence*our confidence in the model*100.
Results
We examined de-identified records of 1641 patients who underwent HCT and were reported to the CIBMTR. Patients were followed for a median of 47.8 months (0.3-312.9) Patient characteristics included 73.4% patient’s age at HCT <18 years, Karnofsky-Lansky (KPS) score ≥ 90 in 74.7%, overall survival 91.2%, event-free survival 75.5%, graft failure (GF) 17.9%, AGVHD 18.3%, and CGVHD 22.3%. Predictive model performance is described in Table 1. Predictive variables that made a significant contribution, and predicted outcomes in three hypothetical scenarios are described in Figure 1. Overall, the predictive model provided acceptable AUC, Balanced accuracy, positive predictive value and sensitivity.
Conclusions
We report the development, testing, and validation of an ML model for individualized prediction of outcomes of HCT for SCD. The model provides acceptable AUC, accuracy, balanced accuracy, positive predictive value and sensitivity. This predictive model has the potential to aid clinicians in making shared decisions with their patients regarding HCT for SCD.
Disclosures: No relevant conflicts of interest to declare.