Using Machine Learning to Predict the Risk of Evolution to Paroxysmal Nocturnal Hemoglobinuria in Patients with Aplastic Anemia

Ren, Erika

Introduction: Paroxysmal nocturnal hemoglobinuria (PNH) is a rare disorder resulting from PIG-A gene mutations, causing loss of expression of CD55 and CD59 and leading to complement-mediated destruction of erythrocytes. With 10% of idiopathic aplastic anemia (AA) patients developing PNH, AA seemingly provides a suitable environment for unregulated, clonal expansion of the PNH clone. However, clonal expansion dynamics are diverse, ranging from primary PNH without anamnestic phase of AA, to smoldering clonal persistence, to various trajectories of clonal expansion.

PNH diagnosis with flow cytometry (FC) cannot predict the risk or degree of clonal expansion. This study aims to develop a machine learning (ML) algorithm that can predict risk of clonal expansion using laboratory values measured at initial presentation.

Methods: The medical records of all AA patients with an initial PNH clone < 20% (n=104) at the participating institutions were examined. Variables collected at time of initial PNH diagnosis were sex (51 male and 53 female), age at diagnosis (median age of 45.6 years), LDH, Hgb, WBC, ANC, severity of AA, PLT, MCV, Haptoglobin, reticulocyte %, PNH granulocyte clone size, PNH monocyte clone size, Type II/III RBC size, thrombosis history, d-dimer, t-bilirubin, and AST/ALT. Secondary variables collected were type and duration of therapy received (eculizumab [n=10], ravulizumab [n=11], pegcetacoplan [n=5]) and exposure to hATG, cyclosporine, or eltrombopag. FC data up to 15 years after initial measurement was assessed and patients were assigned to the expander group (n=21) if their PNH clone increased to greater than 20%, or to the non-expander group (n=83) if they remained <20%.

Missing data (15% of the dataset) was imputed with the MICE algorithm in SciKit Learn. A Multilayer Perceptron algorithm with 4 hidden layers (24, 12, 6, 2 neurons per layer) was used for classification experiments. The leaky rectified linear unit (ReLU) activation function was employed with an alpha value of 0.1. The positive class (expander) was given a higher weight of 3.4 due to the imbalanced class distribution (21 vs. 83) in the dataset. Random Forest feature importance was iteratively calculated (N=10) to rank 23 features. We used Leave-One-Out (LOO) cross-validation to validate our model, where a single subject from the dataset is used as the validation set while the remaining cohort (N-1 = 103) forms the training set. This was repeated until every subject in the dataset had been used as the validation set exactly once. To determine the optimum set of features, we iteratively reduced the number of features in the classification model.

Results: The best preliminary model was achieved with 5 variables (initial PNH granulocyte, monocyte, Type II RBC, and Type III RBC clone sizes, as well as haptoglobin), obtaining a sensitivity of 0.81, a specificity of 0.83, and an ROC-AUC of 0.82.

Initial clone size (median and [min, max]) for the cohort was– granulocytes: 1% [0.01%, 18%], monocytes: 1.5% [0%, 20%], and total RBC clones: 0.21% [0%, 7.7%]. At the last follow-up, the cohort's clone size was– granulocytes: 1.37% [0%, 99.87%], monocytes: 2.5% [0.16%, 91%], and total RBC clones: 0.7% [0%, 96.34%]. For the expander group, initial clone size was– granulocytes: 6.31% [0.17%, 18%], monocytes: 17% [4%, 20%], and total RBC clones: 0.48% [0.03%, 2.82%]. The non-expander group initial clone size was– granulocytes: 0.595% [0.01%, 15%], monocytes: 0.74% [0%, 14%], and total RBC clones: 0.16% [0%, 4%]. The median time from diagnosis to the last follow-up was 8.23 years (expander) and 3.58 years (non-expander).

Conclusion: The preliminary model shows promise in distinguishing between high and low-risk groups using variables consistent with existing literature. The final ML model will draw from a larger cohort size to improve predictive power, offering a valuable tool for early intervention and monitoring to prevent thromboembolic events in high expansion-risk patients.

1315 Using Machine Learning to Predict the Risk of Evolution to Paroxysmal Nocturnal Hemoglobinuria in Patients with Aplastic Anemia