Predictive Modeling of Clonal Hematopoiesis across Diverse Cohorts

Batchi-Bouyou, Armel

Background: Clonal hematopoiesis (CH) is associated an increased risk of hematologic malignancy and numerous other adverse events. While there are no established therapeutic approaches to treat CH, clinical trials are underway, many of which use targeted therapeutic approaches for specific CH genes or genetic pathways. Since CH screening is generally not part of routine clinical testing, the identification of CH-positive individuals is a challenge to recruitment. While age is the most important risk factor for CH, CH is also known to be associated with other demographic and clinical risk factors. Blood count and blood parameters are also known to be influenced by CH with gene-specific patterns. Here we sought to understand whether commonly available clinical predictors and blood counts could be used to develop a gene-specific CH risk prediction tool with the goal of facilitating CH research screening strategies.

Methods: We constructed a prediction model for CH across three cohorts of participants using LASSO regression. Clinical predictors included clinical/demographic characteristics (smoking history, gender, race) and blood count parameters. The UK Biobank served as the development cohort, consisting of 452,547 participants. We used two separate cohorts for validation, The All of Us (AoU) Research database, consisting of 143,850 participants, and MSK-IMPACT cohort, including 8,150 patients with non-hematologic cancers. We compared models with age alone to models including blood count and other clinical/demographic parameters. The predictive performance was determined based on 2 criteria: discrimination by calculating the area under the curve (AUC) receiver operating characteristic (ROC) and calibration by calculating the calibration slope (slope of 1 indicates perfect calibration) and the intercept.

Results: A total of 604,547 participants were included in the study. We observed strong associations between clinical features and gene-specific CH including platelet count with DNMT3A and JAK2, neutrophil count and IDH1/2 mutations, and a strong association between spliceosome CH and age. Overall our model showed excellent discrimination (AUC>0.8) for risk JAK2, ASXL1, PPM1D, SF3B1, SRF2, U2AF1 and modest discrimination (AUC>0.7) for DNMT3A, IDH1/2, TP53 and TET2. Compared to a model with age alone, the addition of blood count and clinical parameters improved the model’s performance most notably for JAK2 (AUC = 0.72 vs 0.82) and IDH1/2 (AUC = 0.75 vs 0.78). The calibration slopes for gene-specific models ranged from 0.35-1.65 and were highest for JAK2 (slope=0.9; intercept=0.02 ) and TP53 (slope=0.89; intercept=-0.02) . To better determine how our risk prediction model could be used to inform CH screening strategies, we determined the number of patients that would be required to screen using our CH risk prediction model and the number needed to sequence to identify 100 CH positive individuals across 10 CH genes. Application of our risk prediction model to identify individuals at high risk of CH for screening reduced the number of samples needed to sequence by 4-19 fold.

Conclusion: We developed and validated a model for gene-specific CH prediction using blood count parameters and demographic factors with strong discriminative performance. These findings highlight the potential of commonly available clinical data to improve CH prediction, aiding in efficient identification of individuals with CH to facilitate clinical trial design.

1285 Predictive Modeling of Clonal Hematopoiesis across Diverse Cohorts