Session: 803. Emerging Tools, Techniques, and Artificial Intelligence in Hematology: Poster II
Hematology Disease Topics & Pathways:
Clinical Practice (Health Services and Quality), Lymphomas, Non-Hodgkin lymphoma, Bioinformatics, Diseases, Lymphoid Malignancies, Emerging technologies, Technology and Procedures, Machine learning
Flow cytometry is essential for diagnosing lymphoid neoplasms, offering detailed cellular profiles via specific antibody panels. However, manual analysis is time-consuming and variable. Machine learning (ML), using a self-organized map (SOM) and random forest classifier, can enhance diagnostic accuracy and efficiency. This study explores how ML can improve lymphoma diagnosis using flow cytometry data.
Objectives
The primary objective of this study was to develop and validate a machine learning (ML) model for accurately diagnosing non-Hodgkin lymphoma (NHL) from flow cytometry data. Secondary objectives included evaluating the model's performance, understanding the reasons for discordant diagnoses, and developing a cloud-based tool for automated lymphoma screening.
Methodology
We assembled a database of over 3,388 lymphoma screening tubes using the standard EuroFlow panel, which includes antibodies targeting CD45, CD3, CD4, CD8, CD56, CD5, TCR gamma delta, CD19, CD20, kappa and lambda surface light chains, and CD38. Two experts annotated the samples as lymphoma or no lymphoma.
Preprocessing involved removing doublets, margins, and artifacts using the R package flowAI. We constructed a self-organized map (SOM) with the flowSOM package. The dataset was split into a training set (70%) and a test set (30%). To address the imbalance in the training set (two-thirds normal, one-third pathological), we used the Synthetic Minority Over-sampling Technique (SMOTE) to generate synthetic NHL cases.
A random forest classifier was trained on the augmented dataset, with cross-validated accuracy metrics for model adjustment. Model performance was evaluated using the area under the curve (AUC) metric. We also reviewed discordant annotations to identify potential human errors.
Results
The model was trained on 2,560 samples, equally split between positive and negative cases. It achieved an out-of-bag (OOB) AUC of 96.6%, a precision-recall (PR) AUC of 95.8%, a Brier score of 0.07, an OOB G-mean of 0.91, and a misclassification rate of 9.2%. In the test set of 828 samples, the model maintained strong performance with an AUC of 91.1%, a PR-AUC of 0.89, a Brier score of 0.10, a G-mean of 0.86, and a misclassification error of 11.2%.
A total of 92 discrepancies were identified in the test set. These included 34 undetected aberrant T or NK cell populations, 21 cases with minimal B-cell NHL infiltration below 0.5%, and 7 from fine-needle aspiration (FNA), pleural, or ascitic fluid samples. Some normal diagnoses were incorrectly predicted as pathological, including 4 from FNA, pleural, or ascitic samples, and 6 affected by artifacts like cryoglobulins and IgM peaks. Additionally, 10 cases initially diagnosed as pathological involved other conditions unrelated to NHL. There were also 3 database misentries where the model was correct, 6 unidentified discrepancies, and one unreported aberrant B-cell population.
We developed SmartCytoFlow, a cloud-based tool to automate the diagnostic process. It monitors folders, identifies new lymphoma screening tubes, pseudonymizes samples, runs the predictive algorithm, and provides pathogenicity predictions and uncertainty metrics. SmartCytoFlow is operational in our hospital, supporting lymphoma diagnosis.
Conclusion
Integrating ML with flow cytometry data significantly improves NHL diagnosis accuracy and efficiency. Our model demonstrated high predictive accuracy and identified areas prone to human error, highlighting its potential as a robust diagnostic aid. The cloud-based tool offers a scalable, reliable solution for routine clinical use. Further validation in diverse clinical settings is needed to confirm its generalizability and utility.
Disclosures: Mosquera Orgueira: GSK: Consultancy; Novartis: Other; Incyte: Other; Takeda: Speakers Bureau; Roche: Consultancy; Pfizer: Consultancy; Abbvie: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; AstraZeneca: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Biodigital THX: Current equity holder in private company; Janssen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau.