-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

2905 Accurate Prediction of Gene Mutations with Flow Cytometry Immune-Phenotyping By Machine Learning Algorithm

Program: Oral and Poster Abstracts
Session: 617. Acute Myeloid Leukemia: Biology, Cytogenetics, and Molecular Markers in Diagnosis and Prognosis: Poster III
Hematology Disease Topics & Pathways:
AML, Diseases, Technology and Procedures, Myeloid Malignancies, Clinically relevant, flow cytometry
Monday, December 7, 2020, 7:00 AM-3:30 PM

Bor-Sheng Ko, MD, PhD1,2*, Yu-Fen Wang, MS3,4*, Jeng-Lin Li, BS5*, Hsin-An Hou6*, Wen-Chien Chou, MD, PhD7, Hwei-Fang Tien, M.D., Ph.D.1, Ting-Yu Chang, M.S.3* and Chi-Chun Lee, Ph.D.5*

1Division of Hematology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
2Department of Hematological Oncology, National Taiwan University Cancer Center, Taipei, Taiwan
3AHEAD Medicine, Berkeley, CA
4Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
5Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan
6National Taiwan University Hospital, Taipei, TWN
7Department of Laboratory Medicine, National Taiwan University Hospital, Taipei, Taiwan


Identification of gene mutation status prior treatment has improved our capability in risk stratifying acute myeloid leukemia (AML) patients greatly, but it is really a labor-exhausting work to identify these mutations. In this this study, we hypothesize immunophenotyping could predict gene mutation and we aim to develop machine learning algorithms that could predict the AML gene mutation status with the immunophenotype by clinical flow cytometry.


Retrospective clinical data of patients with AML, including demographic (age & gender), molecular genetics, cytogenetics as well as flow cytometry (FC) data at National Taiwan University Hospital were collected. A total of 529 newly diagnosed de novo AML from 2009 to 2019 enrolled this study. The median age at diagnosis was 58 years (Table 1). In total, 428 NPM1, 415 FLT3, 331 CEBPA and 338 RUNX1 gene testing results and a total of 529 initial diagnostic FC data from these patients were used in developing the gene mutation prediction models. Each FC data sample contained 100,000 cells acquired on FACSCantoII machine 6 fluorescent channels with multiple fluorescent markers. The markers measured are listed in Table 2. There were 19 combinations of markers and fluorescent channels which were served as feature inputs of the machine learning framework.

Our proposed machine learning framework can be divided as a phenotype representation learning paradigm and a classification model. To derive the phenotype representation, we trained a multivariate Gaussian Mixture Model (GMM) on the 19-dimension FC data to capture the training data distribution and characteristics in a probabilistic unsupervised manner. Then, a Fisher-scoring method was used to vectorize each sample as a high dimensional representation via differential computation in terms of the learned GMM parameters. This Fisher vectorization method transformed samples to a high dimensional feature space as phenotype vectors. We performed analysis of variance (ANOVA)-based feature selection on these representations which were finally fed into the support vector machine (SVM) classifier. To alleviate the negative effects of imbalance classes in gene mutation identification tasks, we applied synthetic minority oversampling technique (SMOTE) algorithm which augmented the minority class by interpolating samples near support vectors. We train independent SVM models to detect the occurrences of the four gene mutation. The algorithm is evaluated by randomly divided 5-fold cross validation which separates 80% data for training and 20% for testing.


This gene mutation rate of this cohort for NPM1, FLT3, CEBPA and RUNX1 were 22.2% (95/428), 25.1% (104/415), 20.2% (67/331), and 13.6% (46/338), respectively. The average accuracies (ACC) of the prediction model performance for NPM1, FLT3, CEBPA and RUNX1 were 82.6%, 76.3%, 84.2% and 84.1%, respectively, whereas the area under the ROC curve (AUC) were 77.9%, 63.4%, 80.7% and 67.7%, respectively (Table. 3).


We demonstrated the potential of the correlation of recurrent AML gene mutation status with immunophenotype of AML through our preliminary gene mutation prediction model. Further study with larger cohorts followed by external validation are needed to further evaluate the feasibility of using machine learning based algorithm as one of triage tools to support physicians in aggressive AML clinical decision before receiving molecular genetic reports.

Disclosures: Ko: Roche: Honoraria.

*signifies non-member of ASH