Type: Oral
Session: 803. Emerging Tools, Techniques and Artificial Intelligence in Hematology: Reading the Blood: Generative and Discriminative AI in Hematology
Hematology Disease Topics & Pathways:
Research, Acute Myeloid Malignancies, AML, artificial intelligence (AI), adult, Translational Research, APL, elderly, bioinformatics, pediatric, Diseases, neonatal, computational biology, young adult , Myeloid Malignancies, Technology and Procedures, Study Population, Human, machine learning, omics technologies, Pathology
In this study we describe the assembly of the largest publicly available methylation dataset of acute leukemias so far, which combines 11 high-quality clinical trials/studies: NOPHO ALL92-2000 (n=796), AAML0531 (n=628), AAML1031 (n=581), BeatAML (n=316), TCGA AML (n=194), French GRAALL 2003-2005 (n=153), TARGET ALL (n=131), CETLAM SMD-09 (n=83), AAML03P1 (n=72), Japanese AML05 (n=64), and CCG2961 (n=41), resulting in a total of 3,059 subjects after preprocessing. Samples were obtained either from bone marrow or peripheral blood, with DNA methylation (meDNA) data procured using the Illumina methylation array 450k or EPIC array, which share 452,453 probes with same chemistry and design. To independently validate the findings derived from the discovery cohort, we processed in parallel meDNA array data from bone marrow specimens at diagnosis of AML patients treated on the multi-center clinical trials AML02 (n=159) and AML08 (n=42) led by St. Jude Children's Research Hospital. Processing the raw data followed best practices from the literature using SeSAMe (PMID: 30085201, 27924034). To generate the atlas, we used a novel dimensionality reduction unsupervised learning algorithm called Pairwise Controlled Manifold Approximation (PaCMAP), which allowed compression of 319,738 processed CpG values into two components for visualization (Figure 1) and five components for downstream classification analysis. To empirically assess classification accuracy of PaCMAP results, we implemented a machine learning pipeline with hyperparameter tuning, 10-fold cross validation, and assessed accuracy per class in discovery and validation cohorts. Not all samples, however, had clinical diagnostic annotation available in the dataset, so only those with the annotations were used in the supervised machine learning model (n=1399 in discovery and n=110 in validation).
Our resulting atlas unveils several clusters of samples that defined 6 hematopoietic lineages: AML, ALL, MDS-related or secondary myeloid neoplasms (MDS-like), Acute promyelocytic leukemia (APL), mixed phenotype leukemia, and otherwise-normal control. These are further subdivided into 30 subclasses overlapping with WHO 2022 and ELN 2022 clinical diagnostic annotations. Subsequent analyses comparing methylation-based subtype prediction to clinically annotated subtypes revealed an overall concordance score of 0.936, with per class 10-fold CV concordance ranging from 0.82 for MDS-like to 1.00 in APL and NUP214 fusions. Importantly, the validation (test) cohort showed accuracies per class of 0.91 for AML with KMT2A-r (n=47) and 1.00 for AML rare recurring translocations (n=10). These are subtypes with large genomic heterogeneity that may be better represented at the epigenomic level. Additionally, these are largely considered standard to high-risk groups with poor prognosis, which invites further studies aiming at uncovering the biological mechanisms behind the methylation patterns governing this classification. Finally, the resulting classifier allowed prediction of WHO/ELN clinical diagnosis for 91 samples in the validation cohort that were previously categorized as “Normal Karyotype”, “Other” or blank. In conclusion, our study effectively showcases the use of a methylome atlas in enhancing the diagnosis of AML subtypes.
Disclosures: Rubnitz: Biomea, Inc: Consultancy.
See more of: Oral and Poster Abstracts