-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

1057 Synthetic Bone Marrow Smears Are a Privacy-Preserving Substitute for Developing Accurate Leukemia Classification Models in Hematological Microscopy

Program: Oral and Poster Abstracts
Type: Oral
Session: 803. Emerging Tools, Techniques, and Artificial Intelligence in Hematology: Pioneering Tools for Tomorrow's Breakthroughs
Hematology Disease Topics & Pathways:
Acute Myeloid Malignancies, AML, Artificial intelligence (AI), APL, Diseases, Myeloid Malignancies, Technology and Procedures, Imaging, Machine learning
Monday, December 9, 2024: 4:30 PM

Jan-Niklas Eckardt1,2*, Ishan Srivastava, MSc3,4*, Zizhe Wang, MSc5*, Susann Winter, PhD2*, Tim Schmittmann5*, Sebastian Riechert, MSc3*, Miriam Eva Helena Gediga, MD4*, Anas Shekh Sulaiman, MD4*, Martin M. K. Schneider, MD4*, Freya Schulze, MD4*, Christian Thiede, MD4, Katja Sockel, MD6*, Frank P. Kroschinsky, MD, MBA4, Christoph Röllig, MD, MSc7*, Martin Bornhäuser, MD4,8,9*, Karsten Wendt, PhD3,5* and Jan Moritz Middeke, MD1,2*

1Else Kroener Fresenius Center for Digital Health, TUD Dresden University of Technology, Dresden, Germany
2Department of Internal Medicine I, University Hospital Carl Gustav Carus, TUD Dresden University of Technology, Dresden, Germany
3Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
4Department of Internal Medicine I, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
5Chair of Software Technology, Technical University Dresden, Dresden, Germany
6University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
7Department of Internal Medicine 1, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
8National Center for Tumor Diseases Dresden (NCT/UCC), Technical University Dresden, Dresden, Germany
9German Cancer Consortium (DKTK), Partner Site Dresden, and German Cancer Research Center (DKFZ), Heidelberg, Germany

The term ‘big data’ has become a buzzword in the medical literature, yet medical data remains largely inaccessible due to insufficient digitization, proprietary restrictions, and privacy concerns. This inaccessibility is particularly detrimental for developing and validating deep learning models for cancer detection in rare diseases like acute myeloid leukemia (AML) and acute promyelocytic leukemia (APL) using bone marrow smears. While conventional methods for sample size augmentation, such as geometric or photometric transformations, can boost training set sizes, we hypothesized that using synthetically generated bone marrow smear images for model training can enhance performance while preserving patient privacy, thereby facilitating unrestricted image data sharing.

We digitized bone marrow smears of 1251 AML and 51 APL patients as well as 236 healthy bone marrow donors by capturing field-of-view images at a resolution of 2560 * 1920 pixels, covering an area of 171 * 128 µm. StyleGAN2-ADA, a generative adversarial network, was used with initialized features for shape and color generation to generate bone marrow smear image data for AML, APL, and healthy donors. Both real and synthetic image data were then fed at varying proportions into a convolutional neural net classification model tasked with disease detection to determine the ratio of real-to-synthetic images needed to train an accurate disease detection model. Imbalances in data set sizes were accommodated for using standard image augmentation techniques such as rotation, mirroring, or linear transformations. Hyperparameter search was performed using the Optuna framework.

To evaluate the quality of synthetic images, a visual Turing test was conducted with 14 hematologists. Using a web application that displayed one image at a time (either real or synthetic), participants were asked to distinguish between the two. The resulting area-under-the-curve (AUC) of 0.63 indicated that experts could not reliably differentiate synthetic images from real bone marrow smears. Next, classification performance for binary decisions (AML vs. donors, APL vs. donors, AML vs. APL) was assessed starting with the total amount (100%) of available real samples (1251 AML, 51 APL, 236 donors) and zero (0%) synthetic samples. The proportion of synthetic images was incrementally increased by 10% while decreasing the real images until training was performed solely on synthetic samples. Starting with real samples only, we obtained a baseline classification AUC of 0.99 for AML vs. donors, 0.99 for APL vs. donors, and 0.99 for AML vs. APL. As the proportion of synthetic images increased, the classification performance remained stable, with AUCs above 0.95 for most real-to-synthetic combinations across all comparisons. Finally, for 0% real and 100% synthetic images, we obtained AUCs of 0.97, 0.99, and 0.96 for AML vs. donors, APL vs. donors, and AML vs. APL, respectively.

Our study highlights the feasibility of synthetic bone marrow image generation and its applicability for training image classification models in hematological microscopy with high accuracy at varying proportions of real and synthetic samples. Interestingly, model performance in our use cases remained high even when only synthetic images and no real images were used. This opens up the possibility to generate and freely share bone marrow image data that can be used to train and validate deep learning models at adequate performance levels, while maintaining patient privacy and overcoming data sharing burdens.

Disclosures: Eckardt: Novartis Oncology: Honoraria, Research Funding; Cancilico GmbH: Current Employment, Current equity holder in private company; Janssen: Consultancy, Honoraria; AstraZeneca: Honoraria; Amgen: Honoraria. Schmittmann: Cancilico: Current equity holder in private company. Riechert: Cancilico: Current equity holder in private company. Schulze: Janssen: Honoraria. Wendt: Cancilico GmbH: Consultancy, Current equity holder in private company. Middeke: Cancilico: Current equity holder in private company; Novartis Oncology: Research Funding.

Previous Abstract | Next Abstract >>
*signifies non-member of ASH