Session: 617. Acute Myeloid Leukemias: Biomarkers, Molecular Markers and Minimal Residual Disease in Diagnosis and Prognosis: Poster I
Hematology Disease Topics & Pathways:
Research, Acute Myeloid Malignancies, AML, Clinical Research, Diseases, Myeloid Malignancies, Technology and Procedures, machine learning
A total of 1707 pAML gene expression profiles were mapped and analyzed from three distinct sources (St. Jude = 659; TARGET = 168; AAML1031 = 880). Raw read count data was normalized and scaled to obtain a relative expression value in transcripts per million (TPM), which served as input for feature selection, model training, and testing. Ground truth labels for all 1707 samples were obtained through multi-omics analysis, including whole genome sequencing, to identify fusions and mutations. For validation purposes, the data was stratified by subtypes and split 70/30 into training (n=1187) and testing (n=520).
Three machine learning models were selected: random forest, XGboost, and linear support vector machine (SVM). Each sample had gene expression TPM data for 60,754 transcripts, out of which 20,004 transcripts related to protein-coding genes were incorporated for feature selection. Feature selection was performed using a median absolute deviation (MAD) algorithm to select the 3000 transcripts with the highest variability. Top variable 3000 genes were selected to allow for adequate tuning of the number of predictors. Each model was independently trained using stratified cross-validation and Monte-Carlo search for hyperparameter tuning. The best model was selected based on the Matthews Correlation Coefficient (MCC). Each model was tested on the hold-out TPM set with z-score normalization.
On the hold-out testing set (n=520), the linear SVM model outperformed the random forest and XGboost models on five performance metrics across all subtypes (sensitivity=0.9577; precision=0.9577; specificity=0.9978; F1=0.96; accuracy=0.9958). The random forest (sensitivity=0.9154; precision=0.9154; specificity=0.9955; F1=0.92; accuracy=0.9915) and XGboost models (sensitivity=0.9231; precision=0.9231; specificity=0.9960; F1=0.92; accuracy=0.9923) also performed well across all subtypes. Although feature selection was shared across all three models, performance within each subtype varied between models. The linear SVM model demonstrated strong performance overall, driven by high specificity in classifying the KMT2Ar subgroup (n=127) and equal sensitivity across the GLIS-rearranged (n=14), GATA1 (n=10), BCL11B (n=7), CBFB::MYH11 (n=60), CEBPA (n=33), and RUNX1::RUNX1T1 (n=78) subtypes.
The primary difference in performance between models is the high false positive rate for KMT2Ar and NPM1 (n=55) in the random forest and XGboost models. A preliminary hypothesis for this might be due to the large representation of KMT2Ar and NPM1 in the training data (24.85% and 10.78%, respectively). Synthetic upsampling (SMOTE) for the training dataset (n=1369) counteracts bias towards the majority classes and increases performance for the random forest (sensitivity=0.9327; precision=0.9327; specificity=0.9965; F1=0.93; accuracy=0.9933), XGboost (sensitivity=0.9404; precision=0.9404; specificity=0.9969; F1=0.94; accuracy=0.9940) and linear SVM (sensitivity=0.9615; precision=0.9615; specificity=0.9980; F1=0.96; accuracy=0.9962) models.
Conjointly, these models demonstrate the utility and effectiveness of a machine learning approach for classifying pAML samples from transcriptome sequencing data, which may have broad clinical and research utility, especially for fusion negative subtypes.
Disclosures: No relevant conflicts of interest to declare.