-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

3606 Smartcytoflow: A Machine Learning Decision Support System for Flow Cytometry Analysis in Non-Hodgkin Lymphoma Diagnosis and Screening

Program: Oral and Poster Abstracts
Session: 803. Emerging Tools, Techniques, and Artificial Intelligence in Hematology: Poster II
Hematology Disease Topics & Pathways:
Clinical Practice (Health Services and Quality), Lymphomas, Non-Hodgkin lymphoma, Bioinformatics, Diseases, Lymphoid Malignancies, Emerging technologies, Technology and Procedures, Machine learning
Sunday, December 8, 2024, 6:00 PM-8:00 PM

Carlos Pérez Míguez1,2*, Jose Angel Diaz Arias1,3*, Jose Antonio Taibo Salorio4*, Davide Crucitti5,6*, Manuel Piñeiro Fiel5,6*, Jesús Gómez Fernández5,6*, Noelia Jorge Ríos1,7*, Rosanna Abal García1,7* and Adrian Mosquera Orgueira, MD, PhD1,8*

1Computational Hematology & Genomics Group (GrHeCo-Xen), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
2Health Research Institute of Santiago de Compostela, Santiago de Compostela, Spain
3University Hospital of Santiago de Compostela, Department of Hematology, IDIS, A Coruña, Spain
4EDISA, Ourense, Spain
5University Hospital of Santiago de Compostela, Department of Hematology, Santiago de Compostela, Spain, Santiago de Compostela, Spain
6Computational Hematology & Genomics Group (GrHeCo-Xen), Health Research Institute of Santiago de Compostela (IDIS), Spain, Santiago de Compostela, Spain
7University Hospital of Santiago de Compostela, Department of Hematology, Santiago de Compostela, Spain
8University Hospital of Santiago de Compostela, Department of Hematology, IDIS, SANTIAGO DE COMPOSTELA, Spain

Introduction

Flow cytometry is essential for diagnosing lymphoid neoplasms, offering detailed cellular profiles via specific antibody panels. However, manual analysis is time-consuming and variable. Machine learning (ML), using a self-organized map (SOM) and random forest classifier, can enhance diagnostic accuracy and efficiency. This study explores how ML can improve lymphoma diagnosis using flow cytometry data.

Objectives

The primary objective of this study was to develop and validate a machine learning (ML) model for accurately diagnosing non-Hodgkin lymphoma (NHL) from flow cytometry data. Secondary objectives included evaluating the model's performance, understanding the reasons for discordant diagnoses, and developing a cloud-based tool for automated lymphoma screening.

Methodology

We assembled a database of over 3,388 lymphoma screening tubes using the standard EuroFlow panel, which includes antibodies targeting CD45, CD3, CD4, CD8, CD56, CD5, TCR gamma delta, CD19, CD20, kappa and lambda surface light chains, and CD38. Two experts annotated the samples as lymphoma or no lymphoma.

Preprocessing involved removing doublets, margins, and artifacts using the R package flowAI. We constructed a self-organized map (SOM) with the flowSOM package. The dataset was split into a training set (70%) and a test set (30%). To address the imbalance in the training set (two-thirds normal, one-third pathological), we used the Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic NHL cases.

A random forest classifier was trained on the augmented dataset, with cross-validated accuracy metrics for model adjustment. Model performance was evaluated using the area under the curve (AUC) metric. We also reviewed discordant annotations to identify potential human errors.

Results

The model was trained on 2,560 samples, equally split between positive and negative cases. It achieved an out-of-bag (OOB) AUC of 96.6%, a precision-recall (PR) AUC of 95.8%, a Brier score of 0.07, an OOB G-mean of 0.91, and a misclassification rate of 9.2%. In the test set of 828 samples, the model maintained strong performance with an AUC of 91.1%, a PR-AUC of 0.89, a Brier score of 0.10, a G-mean of 0.86, and a misclassification error of 11.2%.

A total of 92 discrepancies were identified in the test set. These included 34 undetected aberrant T or NK cell populations, 21 cases with minimal B-cell NHL infiltration below 0.5%, and 7 from fine-needle aspiration (FNA), pleural, or ascitic fluid samples. Some normal diagnoses were incorrectly predicted as pathological, including 4 from FNA, pleural, or ascitic samples, and 6 affected by artifacts like cryoglobulins and IgM peaks. Additionally, 10 cases initially diagnosed as pathological involved other conditions unrelated to NHL. There were also 3 database misentries where the model was correct, 6 unidentified discrepancies, and one unreported aberrant B-cell population.

We developed SmartCytoFlow, a cloud-based tool to automate the diagnostic process. It monitors folders, identifies new lymphoma screening tubes, pseudonymizes samples, runs the predictive algorithm, and provides pathogenicity predictions and uncertainty metrics. SmartCytoFlow is operational in our hospital, supporting lymphoma diagnosis.

Conclusion

Integrating ML with flow cytometry data significantly improves NHL diagnosis accuracy and efficiency. Our model demonstrated high predictive accuracy and identified areas prone to human error, highlighting its potential as a robust diagnostic aid. The cloud-based tool offers a scalable, reliable solution for routine clinical use. Further validation in diverse clinical settings is needed to confirm its generalizability and utility.

Disclosures: Mosquera Orgueira: GSK: Consultancy; Novartis: Other; Incyte: Other; Takeda: Speakers Bureau; Roche: Consultancy; Pfizer: Consultancy; Abbvie: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; AstraZeneca: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Biodigital THX: Current equity holder in private company; Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau.

*signifies non-member of ASH