-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

4981 Generation of Multimodal Longitudinal Synthetic Data By Artificial Intelligence to Improve Personalized Medicine in Hematology

Program: Oral and Poster Abstracts
Session: 803. Emerging Tools, Techniques, and Artificial Intelligence in Hematology: Poster III
Hematology Disease Topics & Pathways:
Research, Acute Myeloid Malignancies, AML, MDS, Artificial intelligence (AI), Adult, Translational Research, MPN, Elderly, Bioinformatics, Chronic Myeloid Malignancies, CMML, Diseases, Myeloid Malignancies, Emerging technologies, Technology and Procedures, Study Population, Human, Imaging, Machine learning, Natural language processing, Omics technologies, Pathology
Monday, December 9, 2024, 6:00 PM-8:00 PM

Saverio D'Amico, MSc1,2*, Mattia Delleani2*, Elisabetta Sauta, PhD2*, Gianluca Asti, MSc2*, Elena Zazzetti, MSc2*, Alessia Campagna2*, Luca Lanino, MD3, Giulia Maggioni, MD2*, Marta Ubezio, MD2*, Gabriele Todisco, MD4*, Antonio Russo, MD2*, Cristina Astrid Tentori, MD2*, Alessandro Buizza5*, Marilena Bicchieri, PhD2*, Matteo Zampini, PhD3*, Matteo Brindisi3*, Francesca Ficara, PhD2*, Elena Riva, PhD3*, Denise Ventura, MSc3*, Laura Crisafulli, PhD3*, Nicole Pinocchio, MSc3*, Flavia Jacobs, MD2*, Alberto Zambelli, MD2*, Victor Savevski, MEng2*, Armando Santoro, MD5*, Tiziana Sanavia, PhD6*, Cesare Rollo, PhD6*, Flavio Sartori, MSc6*, Piero Fariselli, PhD6*, Guillermo Sanz, MD, PhD7, Valeria Santini, MD8, Francesc Sole, PhD9, Uwe Platzbecker, MD10, Pierre Fenaux, MD11, Maria Diez-Campelo, MD, PhD12*, Shahram Kordasti, MD, PhD13,14, Rami S. Komrokji, MD15, Guillermo Garcia-Manero, MD16, Torsten Haferlach, MD17, Amer M. Zeidan, MD18, Gastone Castellani, PhD19* and Matteo Giovanni Della Porta, MD2,20*

1Train s.r.l., Milan, Italy
2IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
3IRCCS Humanitas Research Hospital, Rozzano, Milano, Italy
4IRCCS Humanitas Research Hospital, Houston, TX
5IRCCS Humanitas Research Hospital, Rozzano, Italy
6Computational Biomedicine Unit, Department of Medical Sciences, University of Torino, Turin, Italy
7Hospital Universitario y Politécnico e Instituto de Investigación Sanitaria La Fe, Valencia, Spain
8MDS Unit, Hematology, AOUC, University of Florence, Florence, Italy
9Myelodysplastic Syndromes Research Group, Institut De Recerca Josep Carreras, Badalona, Barcelona, Spain
10Department for Hematology, Cell Therapy, Hemostaseology and Infectious Diseases, University of Leipzig Medical Center, Leipzig, Germany
11Hôpital Saint-Louis, Université de Paris 7, Paris, France
12Hospital Clínico Universitario de Salamanca, Salamanca, Spain
13Hematology Unit, Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Ancona, Italy
14Kings College London, London, United Kingdom
15Department of Malignant Hematology, Moffitt Cancer Center, Tampa, FL
16Department of Leukemia, MD Anderson Cancer Center, Houston, TX
17MLL Munich Leukemia Laboratory, Munich, Germany
18Yale School of Medicine and Yale Comprehensive Cancer Center, Yale University, New Haven, CT
19Dipartimento di Scienze Mediche e Chirurgiche, Università di Bologna, Bologna, Italy
20Department of Biomedical Sciences, Humanitas University, Milan, Italy

Background. In hematology, leveraging real-world multimodal data at large scale is crucial for developing personalized medicine to address unmet clinical needs, particularly for rare diseases. Generative AI in healthcare shows great promise by generating multimodal synthetic data (SD) to improve patients’ diagnosis and prognosis while accelerating clinical research (PMID: 34131324). The challenges in generating SD include accessing complete real-world datasets for model training, maintaining intrinsic relationships among different data layers, and ensuring clinical accuracy and privacy protection.

Aims. This project conducted by GenoMed4All and Synthema consortia, aimed to: 1) implement an innovative approach for generating high-fidelity multimodal SD from patients with myeloid neoplasms (MN); 2) develop a comprehensive multimodal Synthetic Validation Framework (SVF) to assess the SD clinical and statistical fidelity and privacy preservation; 3) verify the SD technology capability to accelerate research and enhance predictive models through multimodal data integration.

Methods. We developed a SD generation pipeline with conditional GAN, Tabular-VAE and Tabular-GPT architectures to generate tabular data including clinical information, cytogenetics, somatic mutations and transcriptomics (bulk RNA-seq of CD34+ bone marrow (BM) cells). Longitudinal information was generated by a hematological fine-tuned Large Language Model. Starting from clinical and genomic features, BM Hematoxylin and Eosin, May-Grunwald Giemsa stained images were generated by a Stable Diffusion model with hematological-trained CLIP module. Privacy preservability, statistical and clinical fidelity of SD were assessed with SVF. MOSAIC framework (PMID: 38875514) was exploited to perform disease classification and personalized prognostic assessment, explained by SHAP. Deep learning-based framework for multimodal analysis in hematology (based on PMID:35944502) was implemented for survival analysis.

Results. Our pipeline, trained on 605 MDS and 877 AML patients, generated 1,210 and 2,631 synthetic patients. Fidelity was assessed by comparing real and SD using SVF. Feature distributions and correlations of clinical information and BM morphological features were comparable (91% and 87% of fidelity respectively). Genomic alterations distribution and pairwise gene association showed 88% of fidelity.

We assessed quality and biological fidelity of real vs. synthetic RNA-seq data. Descriptive statistics, reads coverage distribution, gene-wise dispersion estimates and PCA were comparable in both sets (90% of fidelity). Differentially expressed genes and enriched biological pathways were overlapping as well. Transcriptomic signatures were compared and clinically validated using unsupervised clustering and survival analysis.

We then compared longitudinal outcomes in real and synthetic patients, finding overlapping Overall survival (OS) and leukemia-free survival (LFS) with 96% and 92% of fidelity, and log-rank p-value of 0.77 and 0.52, respectively. In terms of privacy preservability no real patients were copied in SD and NNDR scored 0.84 indicating poor privacy risk.

As clinical validation, we showed that SD augmentation improved performances on disease classification based on clinical, genomic, cytogenetic and BM morphological features. Two XGBOOST classification models trained on real and SD, and tested on a separate real set, resulted in comparable performance (F1-score 76% vs 81%). All features were included in a multimodal deep learning-based framework with OS as primary endpoint. Results showed similar concordance in both models trained on real and SD (0.85 vs 0.84). Preliminary analysis showed that training models on a hybrid dataset (real and SD), improved the performance of classification and prognostic models. We implemented the JUNO platform (https://juno-xkb3corsxq-ew.a.run.app/) to enable clinicians to generate multimodal SD from an existing biobank of real patients.

Conclusion. AI-generated SD accurately replicates statistical properties and complexity of multimodal features in MN. They provide reliable, privacy-compliant and clinically accurate information that can be customized to test scientific hypotheses, validate models, and potentially accelerate clinical trials, thereby improving personalized medicine in hematology.

Disclosures: Santoro: Beigene: Speakers Bureau; Sandoz: Speakers Bureau; Lilly: Speakers Bureau; Arqule: Speakers Bureau; Astrazeneca: Speakers Bureau; Celgene: Speakers Bureau; Amgen: Speakers Bureau; Abb-vie: Speakers Bureau; Roche: Speakers Bureau; Takeda: Speakers Bureau; MSD: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Bayer: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; EISAI: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Pfizer: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Gilead: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Servier: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; BMS: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Incyte: Consultancy; Sanofi: Consultancy; Novartis: Speakers Bureau. Sanz: AstraZeneca, GSK: Consultancy, Honoraria; Novartis, ExCellera: Speakers Bureau; BMS: Research Funding; Novartis, BMS, J&J, Takeda, Amgen, Menarini, Bayer, Pfizer: Other. Santini: Ascentage, AbbVie, Bristol Myers Squibb, CTI BioPharma, Geron, Gilead, Novartis, Servier, Syros Pharmaceuticals: Other: Advisory Board. Platzbecker: Amgen: Consultancy, Research Funding; BMS: Consultancy, Membership on an entity's Board of Directors or advisory committees, Other: Travel support, Research Funding; MDS Foundation: Membership on an entity's Board of Directors or advisory committees; Abbvie: Consultancy, Research Funding; Curis: Consultancy, Honoraria, Research Funding; Geron: Consultancy; Janssen: Consultancy, Honoraria, Research Funding; Merck: Research Funding; Novartis: Consultancy, Research Funding. Fenaux: Astex: Research Funding; Agios: Research Funding; Servier: Research Funding; AbbVie: Honoraria, Research Funding; BMS: Honoraria, Research Funding; Janssen: Research Funding; Novartis: Research Funding; Jazz Pharmaceuticals: Honoraria, Research Funding. Diez-Campelo: AGIOS: Consultancy, Membership on an entity's Board of Directors or advisory committees; SYROS: Membership on an entity's Board of Directors or advisory committees; HEMAVAN: Membership on an entity's Board of Directors or advisory committees; ASTEX/OTSUKA: Membership on an entity's Board of Directors or advisory committees, Other: TRAVEL TO MEETINGS; BLUEPRINT MEDICINES: Consultancy, Membership on an entity's Board of Directors or advisory committees; KEROS: Honoraria, Membership on an entity's Board of Directors or advisory committees; Novartis: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees; GSK: Consultancy, Membership on an entity's Board of Directors or advisory committees; BMS/Celgene: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Advisory board fees; Gilead: Other: Travel reimbursement; CURIS: Membership on an entity's Board of Directors or advisory committees. Kordasti: Novartis: Consultancy, Honoraria, Research Funding, Speakers Bureau; Celgene: Research Funding; Boston Biomed: Consultancy; API: Consultancy; Alexion: Consultancy; Beckman Coulter: Speakers Bureau; MorphoSys: Research Funding; Pfizer: Consultancy, Speakers Bureau. Komrokji: Celgene/BMS: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees; Sobi: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Sumitomo Pharma: Consultancy, Membership on an entity's Board of Directors or advisory committees; Janssen: Consultancy; BMS: Research Funding; Taiho: Membership on an entity's Board of Directors or advisory committees; DSI: Honoraria, Membership on an entity's Board of Directors or advisory committees; Servio: Membership on an entity's Board of Directors or advisory committees; Servio: Honoraria; Novartis: Membership on an entity's Board of Directors or advisory committees; Rigel: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Servier: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Geron: Consultancy, Membership on an entity's Board of Directors or advisory committees; Genentech: Consultancy; AbbVie: Consultancy, Membership on an entity's Board of Directors or advisory committees; Keros: Membership on an entity's Board of Directors or advisory committees; CTI biopharma: Membership on an entity's Board of Directors or advisory committees; PharmaEssentia: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Jazz Pharmaceuticals: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; DSI: Consultancy, Membership on an entity's Board of Directors or advisory committees. Garcia-Manero: Merck: Research Funding; AbbVie: Research Funding; Amphivena: Research Funding; Curis: Research Funding; Janssen: Research Funding; Helsinn: Research Funding; Astex: Other: Personal fees; Helsinn: Other: Personal fees; Novartis: Research Funding; Genentech: Research Funding; Genentech: Other: Personal fees; Astex: Research Funding; Aprea: Research Funding; Onconova: Research Funding; Forty Seven: Research Funding; Bristol Myers Squibb: Other: Personal fees, Research Funding; H3 Biomedicine: Research Funding. Della Porta: Bristol Myers Squibb: Consultancy.

*signifies non-member of ASH