-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

1789 An Unsupervised Machine Learning Method Stratifies Chronic Lymphocytic Leukemia Patients in Novel Categories with Different Risk of Early Treatment

Program: Oral and Poster Abstracts
Session: 641. Chronic Lymphocytic Leukemias: Basic and Translational: Poster I
Hematology Disease Topics & Pathways:
Research, Lymphoid Leukemias, Translational Research, CLL, Diseases, Lymphoid Malignancies, Technology and Procedures, machine learning
Saturday, December 10, 2022, 5:30 PM-7:30 PM

Francesca Cuturello, PhD1*, Federico Pozzo, PhD2*, Edith Natalia Villegas Garcia, MSc1*, Francesca Maria Rossi, PhD2*, Massimo Degan, MD2*, Paola Nanni, MSc2*, Ilaria Cattarossi, MSc2*, Eva Zaina, BSc2*, Paola Varaschin, BSc2*, Alessandra Braida, MSc2*, Michele Berton, MSc2*, Laura Zannier, BSc2*, Filippo Vit, PhD2*, Erika Tissino, PhD2*, Tamara Bittolo, PhD2*, Roberta Laureana, MD3*, Giovanni D'Arena, MD4, Luca Laurenti, MD5*, Agostino Tafuri6, Jacopo Olivieri, MD7*, Francesco Zaja, MD8*, Annalisa Chiarenza9*, Maria Ilaria Del Principe, MD3*, Riccardo Bomben, PhD2*, Antonella Zucchetto, PhD2*, Stefano Cozzini1*, Alessio Ansuini, PhD1*, Alberto Cazzaniga, PhD1* and Valter Gattei, MD2

1Area Science Park, Trieste, Italy
2Clinical and Experimental Onco-Hematology Unit, Centro Di Riferimento Oncologico di Aviano (CRO) IRCCS, Aviano, Italy
3Ematologia, Dipartimento di Biomedicina e Prevenzione, Università degli studi di Roma Tor Vergata, Roma, Italy
4Hematology Service, San Luca Hospital, Vallo Della Lucania, Italy
5Istituto di Ematologia, Fondazione Policlinico Universitario A. Gemelli, Università Cattolica del Sacro Cuore, Rome, Italy
6University Hospital Sant'Andrea-Sapienza, Rome, Italy
7SOC Clinica Ematologica, Azienda Sanitaria Universitaria Friuli Centrale (ASU FC), Udine, Italy
8Institute of Hematology, University and Hospital of Trieste, Trieste, Italy
9Divisione di Ematologia, Ospedale Ferrarotto, A.O.U. Policlinico-OVE, Università di Catania, Catania, Italy

Novel scoring systems have been developed in recent years to improve the accuracy of prognostication from historical clinical staging systems (Rai, Binet) for chronic lymphocytic leukemia (CLL). Most of them, however, rely on discretized and dichotomic values of the various biomarkers to infer prognosis. Here we analyzed the immunophenotypic and (immuno)genetic profiles in a wide CLL cohort by applying unsupervised machine learning methods elaborating prognostic factors as continuous variables, to identify novel relationships and interactions likely missed in conventional models.

The study included 989 CLL patients with Rai stages 0-I-II (50, 36, 14% respectively), 420 (42.4%) treated, analyzed between 2003 and 2020. Treatment-free survival (TFS) was calculated from sampling (median TFS, 46 months). Median time of sampling from diagnosis was 7.2 months (~60% cases within 12 months). The studied laboratory-based markers were: CD20, FMC7, CD49d, CD49c, CD38, CD23, CD43, CD22, ZAP-70 expression by flow cytometry (reported as % of positive cells); del13, tris12, del11 and del17 cytogenetic abnormalities detected by FISH, reported as % of nuclei with abnormal signal; mutational status of TP53 by deep NGS, reported as % variant allele fraction (VAF); IGHV gene mutational status, reported as % mutations.

By applying Cox proportional hazard model to estimate TFS on this pool of features, we selected the features with a fit p-value<0.05, i.e. del11, tris12, TP53 mutations, % IGHV mutation, expression of CD38 and CD49d.

Then we grouped similar profiles with an unsupervised k-means algorithm, optimized by the Elbow method, to partition observations into 6 clusters (C1-C6). Clustering confidence for each patient was estimated through leave-one-out procedure and the average normalized score was 0.83 (0.85, 0.86, 0.79, 0.91, 0.73, 0.84, for C1 through C6, respectively). Centroid analysis was employed to evaluate which features mostly defined each cluster, as detailed below.

C1 (n=220): all cases with <4.1% IGHV mutations, mostly (70%) IGHV-unmutated (UM, <2% cut-off), low CD49d expression (<30% of positive cells) in 90% of cases, low representation of other features (tris12, del11, CD38, TP53 mutations);

C2 (n=210): high CD49d expression (99% of cases), equally balanced IGHV status (52% UM), low representation of other features;

C3 (n=147): tri12 cases (100%; >10% of nuclei), concurrent high expression of CD49d (95%) and CD38 (67%), slightly enriched in UM IGHV cases (60%);

C4 (n=303): cases heavily mutated in IGHV genes (mutation range 4.1-22.0%), low representation of all the other features;

C5 (n=52): TP53 mutated cases with high mutation burden (VAF 35-97%), skewed towards UM IGHV (70%), irrelevant all the other features;

C6 (n=57): highly clonal del11q cases (range 50-98%) mutually exclusive with TP53 mutations (2 mutated cases only), mostly UM IGHV (89%), low representation of all other features.

Notably, in C1-C4, cases bearing TP53 mutations were present, although representing a minority (5-10%), mostly with low-VAF (median 6.7%, range 5-39%).

Kaplan-Meier analysis revealed heterogeneous behaviors: C2, C3, C5, C6 presented a 50% TFS of 59, 25, 5, 9 months respectively, whereas TFS was not reached for C1 and C4 (Figure, A).

Hierarchical agglomerative clustering identified 3 major risk classes (Figure, B). The high risk class (n=109) comprised C5 and C6 (TP53 mutations and del11q); the intermediate risk class (n=577) stratified on C1-2-3; the low risk class (n=303) was made by C4 only (CLL with highly mutated IGHV). The 50% TFS for high-intermediate-low risk was 7, 50 and not reached, respectively.

In conclusion, we present here a novel machine-learning-driven, laboratory-based classification for predicting the risk of early treatment in CLL. Our approach identifies clusters at different risk with some novelties: i) a high IGHV mutational burden (i.e >4%) in the absence of other markers (e.g. CD49d) identifies patients with a particularly benign clinical course; ii) TP53 mutations and del11q associate with high risk of early treatment only if present in the vast majority of the CLL clone; iii) a IGHV status with low burden of mutations (i.e. <4%) along with CD49d expression or tris12 identifies patients at intermediate risk.

These novel stratifications should be incorporated in risk algorithms for treatment prediction of CLL patients. Validation in additional independent cohorts is needed.

Disclosures: No relevant conflicts of interest to declare.

*signifies non-member of ASH