Session: 503. Clonal Hematopoiesis, Aging, and Inflammation: Poster I
Hematology Disease Topics & Pathways:
Research, Translational Research, Bone Marrow Failure Syndromes, Bioinformatics, Diseases, Technology and Procedures, Machine learning
Methods: We constructed a prediction model for CH across three cohorts of participants using LASSO regression. Clinical predictors included clinical/demographic characteristics (smoking history, gender, race) and blood count parameters. The UK Biobank served as the development cohort, consisting of 452,547 participants. We used two separate cohorts for validation, The All of Us (AoU) Research database, consisting of 143,850 participants, and MSK-IMPACT cohort, including 8,150 patients with non-hematologic cancers. We compared models with age alone to models including blood count and other clinical/demographic parameters. The predictive performance was determined based on 2 criteria: discrimination by calculating the area under the curve (AUC) receiver operating characteristic (ROC) and calibration by calculating the calibration slope (slope of 1 indicates perfect calibration) and the intercept.
Results: A total of 604,547 participants were included in the study. We observed strong associations between clinical features and gene-specific CH including platelet count with DNMT3A and JAK2, neutrophil count and IDH1/2 mutations, and a strong association between spliceosome CH and age. Overall our model showed excellent discrimination (AUC>0.8) for risk JAK2, ASXL1, PPM1D, SF3B1, SRF2, U2AF1 and modest discrimination (AUC>0.7) for DNMT3A, IDH1/2, TP53 and TET2. Compared to a model with age alone, the addition of blood count and clinical parameters improved the model’s performance most notably for JAK2 (AUC = 0.72 vs 0.82) and IDH1/2 (AUC = 0.75 vs 0.78). The calibration slopes for gene-specific models ranged from 0.35-1.65 and were highest for JAK2 (slope=0.9; intercept=0.02 ) and TP53 (slope=0.89; intercept=-0.02) . To better determine how our risk prediction model could be used to inform CH screening strategies, we determined the number of patients that would be required to screen using our CH risk prediction model and the number needed to sequence to identify 100 CH positive individuals across 10 CH genes. Application of our risk prediction model to identify individuals at high risk of CH for screening reduced the number of samples needed to sequence by 4-19 fold.
Conclusion: We developed and validated a model for gene-specific CH prediction using blood count parameters and demographic factors with strong discriminative performance. These findings highlight the potential of commonly available clinical data to improve CH prediction, aiding in efficient identification of individuals with CH to facilitate clinical trial design.
Disclosures: No relevant conflicts of interest to declare.
See more of: Oral and Poster Abstracts