-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

2268 Mimicking Clinical Trials with Synthetic Acute Myeloid Leukemia Patients Using Generative Artificial Intelligence

Program: Oral and Poster Abstracts
Session: 803. Emerging Tools, Techniques and Artificial Intelligence in Hematology: Poster I
Hematology Disease Topics & Pathways:
Research, artificial intelligence (AI), Acute Myeloid Malignancies, AML, adult, Translational Research, Diseases, Myeloid Malignancies, emerging technologies, Technology and Procedures, Study Population, Human, machine learning
Saturday, December 9, 2023, 5:30 PM-7:30 PM

Jan-Niklas Eckardt1,2*, Waldemar Hahn, MSc3,4*, Christoph Röllig, MD, MSc5*, Sebastian Stasik, PhD5*, Uwe Platzbecker, MD6, Carsten Müller-Tidow, MD7*, Hubert Serve, MD8, Claudia D Baldus, MD9*, Christoph Schliemann, MD10*, Kerstin Schäfer-Eckart, MD11*, Maher Hanoun, MD, PhD12*, Martin Kaufmann, MD13, Andreas Burchert, MD14, Christian Thiede, MD5, Johannes Schetelig, MD, MSc5, Martin Bornhäuser, MD5,15*, Markus Wolfien, PhD4* and Jan Moritz Middeke, MD2*

1Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
2Department of Internal Medicine I, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
3Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Dresden, Germany
4Institute for Medical Informatics and Biometry, Technical University Dresden, Dresden, Germany
5Department of Internal Medicine 1, University Hospital Carl Gustav Carus, Technical University Dresden, Dresden, Germany
6Department of Hematology, Cell Hematology and Hemostaseology, Leipzig University Hospital, Leipzig, Germany
7Heidelberg University Hospital, Heidelberg, Germany
8Department of Medicine II, Hematology/Oncology, Goethe University Hospital, Frankfurt, Germany
9Department of Internal Medicine II (Hematology/Oncology), University Hospital Schleswig-Holstein Campus Kiel, Kiel, Germany
10Department of Medicine A, Hematology, Oncology and Pneumology, University Hospital Muenster, Muenster, Germany
11Department of Internal Medicine V, Paracelsus University Hospital Nuremberg, Nuremberg, Germany
12Department of Hematology, University Hospital Essen, Essen, Germany
13Department of Hematology, Oncology and Palliative Care, Robert Bosch Hospital, Stuttgart, Germany
14Department of Hematology, Oncology and Immunology, University Hospital Marburg, Marburg, Germany
15National Center for Tumor Diseases (NCT/UCC) Dresden, Technical University Dresden, Dresden, Germany

Data sharing is often hindered by concerns of patient privacy, regulatory aspects, and proprietary interests thereby impeding scientific progress and establishing a gatekeeping mechanism in clinical medicine since obtaining large data sets is costly and time-consuming. We employed two different generative artificial intelligence (AI) technologies: CTAB-GAN+ and Normalizing Flows (NFlow) to synthesize clinical trial data based on pooled patient data from four previous multicenter clinical trials of the German Study Alliance Leukemia (AML96, AML2003, AML60+, SORAML) that enrolled adult patients (n=1606) with acute myeloid leukemia (AML) who received intensive induction therapy.

As a generative adversarial network (GAN), CTAB-GAN+ consists of two adversarial networks: a generator producing synthetic samples from random noise and a discriminator aiming to distinguish between real and synthetic samples. The model converges as the discriminator can no longer reliably differentiate between real or synthetic data. Contrastingly, NFlow consists of a sequence of invertible transformations (flows) starting from a simple base distribution and gradually adding complexity to better mirror the training data.

Both models were trained on tabular data including demographic, laboratory, molecular genetic and cytogenetic patient variables. Detection of molecular alterations in the original cohort was performed via next-generation sequencing (NGS) using the TruSight Myeloid Sequencing Panel (Illumina, San Diego, CA, USA) with a 5% variant-allele frequency (VAF) mutation calling cut-off. For cytogenetics, standard techniques for chromosome banding and fluorescence-in-situ-hybridization (FISH) were used. Hyperparameter tuning of generative models was conducted using the Optuna Framework. For each model, we used a total of 70 optimization trials to optimize a custom score inspired by TabSynDex which assesses both the resemblance of the synthetic data to real training data and its utility. Pairwise analyses were conducted between the original and both synthetic data sets, respectively. All tests were carried out as two-sided tests using a significance level α of 0.05.

Table 1 summarizes baseline patient characteristics and outcome for both synthetic cohorts compared to the original cohort. Firstly, we found both models to adequately represent patient features, albeit that some individual variables showed a statistically significant deviation from the original cohort. It is important to note that for such a large sample size (n=1606 for each cohort), even miniscule differences can be rendered statistically significant notwithstanding any meaningful clinical difference. Interestingly, variables that deviated from the original distribution were different for both models indicating model architecture to play a vital role in sample representation: While CTAB-GAN+ showed significant deviations for both age and sex, NFlow showed significant deviations for AML status. Complete remission rate was similar between original (70.7%, odds ratio [OR]: 2.41) and CTAB-GAN+ (73.7%, OR: 2.81, p=0.059) and NFlow (69.1%, OR: 2.24, p=0.356). For event-free survival (EFS), which was not included as a target in hyperparameter tuning, both networks deviated significantly from the original cohort (original: median 7.2 months, HR: 1.36; CTAB-GAN+: median 12.8 months, HR 0.74, p<0.001; NFlow: median 9.0 months, HR: 0.87, p=0.001). Overall survival (OS) was well represented by NFlow compared to the original cohort, while CTAB-GAN+ showed a significant deviation (original: median 17.5 months, HR: 1.14; CTAB-GAN+: median 19.5 months, HR 0.88, p<0.001; NFlow: median 16.2 months, HR: 1.00, p=0.055). Both models showed an adequate graph representation in Kaplan-Meier analysis (Figure 1).

Here, we demonstrate using two different generative AI technologies that synthetic data generation provides an attractive solution to circumvent issues in current standards of data collection and sharing. It effectively allows for bypassing logistical, organizational, and financial burdens, as well as regulatory and ethical concerns. Ultimately, this enables explorative research inquiries into previously inaccessible data sets and offers the prospect of fully synthetic control arms in prospective clinical trials.

Disclosures: Röllig: Pfizer: Consultancy, Honoraria, Research Funding; Novartis: Consultancy, Honoraria, Research Funding; Janssen: Consultancy, Honoraria; BMS: Consultancy, Honoraria; Astellas: Consultancy, Honoraria; AbbVie: Consultancy, Honoraria, Research Funding; Servier: Consultancy, Honoraria. Platzbecker: Merck: Research Funding; Jazz: Consultancy, Honoraria, Research Funding; Amgen: Consultancy, Research Funding; Bristol Myers Squibb: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: travel support; medical writing support, Research Funding; Novartis: Consultancy, Honoraria, Research Funding; Geron: Consultancy, Research Funding; Janssen Biotech: Consultancy, Research Funding; AbbVie: Consultancy; Curis: Consultancy, Research Funding; MDS Foundation: Membership on an entity's Board of Directors or advisory committees; Fibrogen: Research Funding; Roche: Research Funding; Celgene: Honoraria; Syros: Consultancy, Honoraria, Research Funding; Takeda: Consultancy, Honoraria, Research Funding; Servier: Consultancy, Honoraria, Research Funding; Silence Therapeutics: Consultancy, Honoraria, Research Funding; BeiGene: Research Funding; BMS: Research Funding. Baldus: Amgen: Consultancy; AstraZeneca: Consultancy; BMS: Consultancy; Jazz Pharmaceuticals: Consultancy; Astellas: Consultancy; Gilead: Consultancy; Jannsen: Consultancy. Schliemann: Boehringer Ingelheim: Research Funding; AngioBiomed: Research Funding; Pfizer: Honoraria, Other; Roche: Honoraria; Novartis AG: Honoraria; Jazz Pharmaceuticals: Honoraria, Other, Research Funding; Bristol Myers Squibb: Honoraria, Other; AstraZeneca: Honoraria; Astellas Pharma Inc.: Honoraria; Laboratoires Delbert: Honoraria; AbbVie Inc.: Honoraria, Other. Burchert: MSD: Research Funding; Incyte: Honoraria; Novartis: Honoraria, Research Funding. Schetelig: Abbvie: Consultancy, Honoraria; Janssen: Consultancy, Honoraria; BMS: Consultancy, Honoraria; BeiGene: Consultancy, Honoraria; Eurocept: Honoraria; AstraZeneca: Consultancy, Honoraria; Novartis: Honoraria. Middeke: Novartis Oncology: Research Funding.

Previous Abstract | Next Abstract >>
*signifies non-member of ASH