Session: 723. Clinical Allogeneic and Autologous Transplantation: Late Complications and Approaches to Disease Recurrence: Poster II
Hematology Disease Topics & Pathways:
Leukemia, ALL, Biological, Diseases, bone marrow, Therapies, Pediatric, Lymphoid Malignancies, Study Population, Clinically relevant, transplantation, stem cells
We use machine learning methods, specifically random forest classification (RF), to build a predictive model of post-transplant relapse and to analyze the data from a cohort of 46 pediatric patients, who received HSCT for acute lymphoblastic leukemia (ALL) and had serial lineage-specific chimerism testing post-transplant. Our model achieved 58 % sensitivity and 98% specificity at predicting relapses in cross validation compared to a baseline model (24% sensitivity, 76% specificity). Consistent with previous reports, our model implicates both peripheral blood (PB) donor CD34 and CD3 chimerism as important variables for relapse. More importantly, the RF showed how different variables interacted with each other, providing additional insights into how to best interpret post-transplant chimerism results. To our knowledge, this is the first study featuring RF machine learning methods in the clinical setting of relapse after HSCT.
We use a dataset of patients with ALL undergoing HSCT at Lucile Packard Children’s Hospital from 2012 to 2018. Variables collected are summarized in Table 1. The analytical sensitivity of STR-based chimerism testing is 1%. Chimerism results on the same day of relapse were excluded from the analysis. The RF model is based on a set of 500 individual decision trees, each based on a bootstrapped sample of the patient data. A 5-fold cross-validation was used to test predictive skill, with 20% of patients excluded from each fold. We compared results with a Monte Carlo baseline model in which relapse status was repeatedly assigned randomly to each patient with a probability based on the prevalence of relapse in our cohort.
Patients, transplantation, and relapse characteristics are summarized in Table 2. Chimerism data are summarized in Table 3. The cross-validation results show a robust predictive skill of relapse within 2 years post-transplant. Our RF achieved 58% sensitivity and 98% specificity, greatly improving the predictive values from the base model (Table 4).
Variable importance, the ability of a variable to decrease the error of the prediction model, was calculated for all variables used in our RF (Figure 1). Our analysis shows that the age at the time of transplant has the highest importance, followed by PB donor CD34 chimerism. Bone marrow chimerism generally has lower importance suggesting PB monitoring only is adequate in the clinical setting.
We showcase the relationships of 1) age at transplant, 2) donor PB CD34, and 3) donor PB CD3 chimerism to the odds of relapse using a partial dependence plot. Younger patients relapse less often. Donor PB CD34 chimerism exhibits a threshold effect, in which the odds of relapse dramatically decreases when it is above 95% while donor PB CD3 chimerism has a more gradual linear profile (Figure 2). 2D dependence plot of donor PB CD34 and PB CD3 chimerism shows the interaction of the two variables (Figure 3) as continuous variables; relapse risk remaining low with even if donor PB CD3 chimerism is as low as 50% as long as donor PB CD34 chimerism is > 95%.
Our study shows that machine learning methods such as RF can be very useful at making accurate predictive model of post-HSCT complications that incorporates multiple variables, allowing for more granular differentiation between different patients. Such analyses can enable more effective deployment of risk-adapted, personalized treatment. By building hundreds of independent decision trees, the RF is also able provide useful insights to the interaction between different variables in a clinically relevant manner.
Disclosures: No relevant conflicts of interest to declare.