-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

1938 Smartcytoflow: A Machine Learning Decision Support System for Flow Cytometry Analysis in Multiple Myeloma Diagnosis and Monitoring

Program: Oral and Poster Abstracts
Session: 653. Multiple Myeloma: Clinical and Epidemiological: Poster I
Hematology Disease Topics & Pathways:
Research, Artificial intelligence (AI), Translational Research, Bioinformatics, Computational biology, Emerging technologies, Technology and Procedures, Measurable Residual Disease , Machine learning
Saturday, December 7, 2024, 5:30 PM-7:30 PM

Carlos Pérez Míguez1*, Jose Angel Diaz Arias2*, Davide Crucitti3,4*, Jesús Gómez Fernández1*, Manuel Piñeiro Fiel1*, Marta Sonia Gonzalez Perez5*, Maria-Victoria Mateos, MD, PhD6 and Adrian Mosquera Orgueira, MD, PhD7,8*

1Health Research Institute of Santiago de Compostela, Santiago de Compostela, Spain
2University Hospital of Santiago de Compostela, Department of Hematology, IDIS, A Coruña, Spain
3Pharmacology, University of Santiago de Compostela, Santiago de Compostela, Spain
4Group of Computational Hematology and Genomics, Health Research Institute of Santiago de Compostela, Santiago de Compostela, Spain
5University Hospital of Santiago de Compostela, Santiago de Compostela, Spain
6University of Salamanca, Department of Medicine, Salamanca, Spain
7University Hospital of Santiago de Compostela, Department of Hematology, IDIS, SANTIAGO DE COMPOSTELA, Spain
8Hematology Department, University Hospital of Santiago de Compostela, Group of Computational Hematology and Genomics, IDIS, Santiago de Compostela, Spain

Introduction

Monoclonal gammopathies, particularly multiple myeloma (MM), pose significant diagnostic challenges due to their complex cellular profiles. Flow cytometry, enhanced by EuroFlow standards, offers a detailed analysis but remains labor-intensive and prone to inter-observer variability. Machine learning can streamline this process, providing consistent and accurate diagnostic support. This study explores the application of machine learning to flow cytometry data for the diagnosis and minimal residual disease (MRD) detection in MM.

Objectives

The primary objective was to develop and validate a machine learning model for the accurate diagnosis and MRD detection in MM using flow cytometry data. Secondary objectives included balancing the dataset, improving plasma cell enrichment, and evaluating the model's performance in a clinical setting.

Methodology

We collected over 800 samples studied according to EuroFlow standards for MM, which included two different tubes: the first tube comprising CD138, CD38, CD45, CD19, CD56, CD27, CD117, CD81, CD20, CD200, CD28, and cytoplasmic immunoglobulin kappa/lambda, and the second tube comprising CD138, CD38, CD45, CD19, CD56, CD27, CD117, CD81, CD20, CD200, CD28, and cytoplasmic immunoglobulin kappa/lambda. Among these, 44% of the samples were for diagnosis purposes and 56% for MRD detection, with 90% of diagnostic samples confirmed as pathogenic by expert review.

Preprocessing included using the Bioconductor package flowAI to remove doublets, margins, and artifacts. A gating strategy was applied to enrich the analysis for plasma cells, extracting only positive events from each sample. To address dataset imbalances, we applied the Synthetic Minority Over-sampling Technique (SMOTE) in the training set.

We employed flowSOM for clustering, extracting clusters and metaclusters from each tube, which were then fed into a random forest classifier. The model was trained and cross-validated in the training set, followed by independent validation in the test set.

Results

The training set comprised 1,000 samples with an equal distribution of class labels (500 positive, 500 negative). The random forest model exhibited robust performance during the training phase, achieving an out-of-bag (OOB) area under the curve (AUC) of 99.3%, a precision-recall (PR) AUC of 99.3%, and a Brier score of 0.04, indicating high accuracy. The OOB G-mean was 0.95, with a misclassification rate of 4.8%. The confusion matrix indicated a class error of 1.2% for the negative class and 8.4% for the positive class.

The test set consisted of 138 samples. During the validation phase, the model maintained strong performance, with an AUC of 91.6%, a PR-AUC of 0.72, and a Brier score of 0.10. The G-mean for the test set was 0.91, with a misclassification rate of 10.9%. The confusion matrix for the test set showed a class error of 2.3% for the negative class and 14.7% for the positive class. Further analysis of the test set revealed that 56 samples were diagnostic, with a misclassification rate of 5%, while 82 samples were obtained for MRD detection, where the rate increased to 13.58%.

In addition to the robust performance metrics, the implementation of our machine learning model within the SmartCytoFlow platform has significantly streamlined the diagnostic workflow. SmartCytoFlow automates data preprocessing, gating, clustering, and classification, providing real-time diagnostic support. The integration has resulted in a substantial reduction in analysis time and improved diagnostic consistency.

Conclusion

Our study underscores the efficacy of integrating machine learning with flow cytometry for diagnosing and monitoring of monoclonal gammopathies, and particularly multiple myeloma. This approach offers a promising adjunct to traditional diagnostic methods, ensuring consistency and accuracy in clinical settings. Future validation in diverse cohorts is warranted to establish its broader applicability and utility in routine diagnostic practice.

Disclosures: Mateos: Pfizer: Honoraria, Membership on an entity's Board of Directors or advisory committees; Amgen: Honoraria, Membership on an entity's Board of Directors or advisory committees; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees; Johnson and Johnson: Honoraria, Membership on an entity's Board of Directors or advisory committees; Sanofi: Honoraria; GSK: Honoraria, Membership on an entity's Board of Directors or advisory committees; F. Hoffmann-La Roche Ltd: Honoraria, Membership on an entity's Board of Directors or advisory committees; Abbvie: Honoraria, Membership on an entity's Board of Directors or advisory committees; Regeneron: Honoraria; Stemline: Honoraria, Membership on an entity's Board of Directors or advisory committees; Kite: Honoraria, Membership on an entity's Board of Directors or advisory committees; Oncopeptides: Honoraria; Salamanca University: Current Employment; Celgene: Honoraria. Mosquera Orgueira: GSK: Consultancy; Novartis: Other; Incyte: Other; Takeda: Speakers Bureau; Roche: Consultancy; Pfizer: Consultancy; Abbvie: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; AstraZeneca: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Biodigital THX: Current equity holder in private company; Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau.

*signifies non-member of ASH