Smartcytoflow: A Machine Learning Decision Support System for Flow Cytometry Analysis in Non-Hodgkin Lymphoma Diagnosis and Screening

Mosquera Orgueira, Adrian

Oral and Poster Abstracts
803. Emerging Tools, Techniques, and Artificial Intelligence in Hematology: Poster II

Clinical Practice (Health Services and Quality), Lymphomas, Non-Hodgkin lymphoma, Bioinformatics, Diseases, Lymphoid Malignancies, Emerging technologies, Technology and Procedures, Machine learning

Carlos Pérez Míguez^1,2^*, Jose Angel Diaz Arias^1,3^*, Jose Antonio Taibo Salorio⁴^*, Davide Crucitti^5,6^*, Manuel Piñeiro Fiel^5,6^*, Jesús Gómez Fernández^5,6^*, Noelia Jorge Ríos^1,7^*, Rosanna Abal García^1,7^* and Adrian Mosquera Orgueira, MD, PhD^1,8^*

¹Computational Hematology & Genomics Group (GrHeCo-Xen), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain
²Health Research Institute of Santiago de Compostela, Santiago de Compostela, Spain
³University Hospital of Santiago de Compostela, Department of Hematology, IDIS, A Coruña, Spain
⁴EDISA, Ourense, Spain
⁵University Hospital of Santiago de Compostela, Department of Hematology, Santiago de Compostela, Spain, Santiago de Compostela, Spain
⁶Computational Hematology & Genomics Group (GrHeCo-Xen), Health Research Institute of Santiago de Compostela (IDIS), Spain, Santiago de Compostela, Spain
⁷University Hospital of Santiago de Compostela, Department of Hematology, Santiago de Compostela, Spain
⁸University Hospital of Santiago de Compostela, Department of Hematology, IDIS, SANTIAGO DE COMPOSTELA, Spain

Introduction

Flow cytometry is essential for diagnosing lymphoid neoplasms, offering detailed cellular profiles via specific antibody panels. However, manual analysis is time-consuming and variable. Machine learning (ML), using a self-organized map (SOM) and random forest classifier, can enhance diagnostic accuracy and efficiency. This study explores how ML can improve lymphoma diagnosis using flow cytometry data.

Objectives

The primary objective of this study was to develop and validate a machine learning (ML) model for accurately diagnosing non-Hodgkin lymphoma (NHL) from flow cytometry data. Secondary objectives included evaluating the model's performance, understanding the reasons for discordant diagnoses, and developing a cloud-based tool for automated lymphoma screening.

Methodology

We assembled a database of over 3,388 lymphoma screening tubes using the standard EuroFlow panel, which includes antibodies targeting CD45, CD3, CD4, CD8, CD56, CD5, TCR gamma delta, CD19, CD20, kappa and lambda surface light chains, and CD38. Two experts annotated the samples as lymphoma or no lymphoma.

Preprocessing involved removing doublets, margins, and artifacts using the R package flowAI. We constructed a self-organized map (SOM) with the flowSOM package. The dataset was split into a training set (70%) and a test set (30%). To address the imbalance in the training set (two-thirds normal, one-third pathological), we used the Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic NHL cases.

A random forest classifier was trained on the augmented dataset, with cross-validated accuracy metrics for model adjustment. Model performance was evaluated using the area under the curve (AUC) metric. We also reviewed discordant annotations to identify potential human errors.

Results

The model was trained on 2,560 samples, equally split between positive and negative cases. It achieved an out-of-bag (OOB) AUC of 96.6%, a precision-recall (PR) AUC of 95.8%, a Brier score of 0.07, an OOB G-mean of 0.91, and a misclassification rate of 9.2%. In the test set of 828 samples, the model maintained strong performance with an AUC of 91.1%, a PR-AUC of 0.89, a Brier score of 0.10, a G-mean of 0.86, and a misclassification error of 11.2%.

A total of 92 discrepancies were identified in the test set. These included 34 undetected aberrant T or NK cell populations, 21 cases with minimal B-cell NHL infiltration below 0.5%, and 7 from fine-needle aspiration (FNA), pleural, or ascitic fluid samples. Some normal diagnoses were incorrectly predicted as pathological, including 4 from FNA, pleural, or ascitic samples, and 6 affected by artifacts like cryoglobulins and IgM peaks. Additionally, 10 cases initially diagnosed as pathological involved other conditions unrelated to NHL. There were also 3 database misentries where the model was correct, 6 unidentified discrepancies, and one unreported aberrant B-cell population.

We developed SmartCytoFlow, a cloud-based tool to automate the diagnostic process. It monitors folders, identifies new lymphoma screening tubes, pseudonymizes samples, runs the predictive algorithm, and provides pathogenicity predictions and uncertainty metrics. SmartCytoFlow is operational in our hospital, supporting lymphoma diagnosis.

Conclusion

Integrating ML with flow cytometry data significantly improves NHL diagnosis accuracy and efficiency. Our model demonstrated high predictive accuracy and identified areas prone to human error, highlighting its potential as a robust diagnostic aid. The cloud-based tool offers a scalable, reliable solution for routine clinical use. Further validation in diverse clinical settings is needed to confirm its generalizability and utility.

Disclosures: Mosquera Orgueira: GSK: Consultancy; Novartis: Other; Incyte: Other; Takeda: Speakers Bureau; Roche: Consultancy; Pfizer: Consultancy; Abbvie: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; AstraZeneca: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Biodigital THX: Current equity holder in private company; Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau.

See more of: 803. Emerging Tools, Techniques, and Artificial Intelligence in Hematology: Poster II
See more of: Oral and Poster Abstracts

<< Previous Abstract | Next Abstract >>

^*signifies non-member of ASH

3606 Smartcytoflow: A Machine Learning Decision Support System for Flow Cytometry Analysis in Non-Hodgkin Lymphoma Diagnosis and Screening