-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

3597 The “David Vs Goliath” Study: Application of Large Language Models (LLM) for Automatic Medical Information Retrieval from Multiple Data Sources to Accelerate Clinical and Translational Research in Hematology

Program: Oral and Poster Abstracts
Session: 803. Emerging Tools, Techniques, and Artificial Intelligence in Hematology: Poster II
Hematology Disease Topics & Pathways:
Research, Acute Myeloid Malignancies, AML, MDS, Artificial intelligence (AI), Adult, Translational Research, MPN, Elderly, Clinical Research, Genomics, Bioinformatics, Chronic Myeloid Malignancies, Diseases, Real-world evidence, Myeloid Malignancies, Biological Processes, Emerging technologies, Technology and Procedures, Multi-systemic interactions, Study Population, Human, Machine learning, Natural language processing, Omics technologies
Sunday, December 8, 2024, 6:00 PM-8:00 PM

Mattia Delleani1*, Saverio D'Amico, MSc1,2*, Elisabetta Sauta, PhD1*, Gianluca Asti, MSc1*, Elena Zazzetti, MSc1*, Alessia Campagna1*, Luca Lanino, MD3, Giulia Maggioni, MD1*, Maria Chiara Grondelli, BSc4*, Alessandro Forcina Barrero, BSc4*, Pierandrea Morandini, MEng1*, Marta Ubezio, MD1*, Gabriele Todisco, MD4,5*, Antonio Russo, MD1*, Cristina Astrid Tentori, MD1*, Alessandro Buizza6*, Arturo Bonometti, MD1,4*, Cesare Lancellotti, MD7*, Luca Di Tommaso, MD1,4*, Daoud Rahal, MD1*, Marilena Bicchieri, PhD1*, Victor Savevski, MEng1*, Armando Santoro, MD6*, Valeria Santini8*, Francesc Sole, PhD9, Uwe Platzbecker, MD10, Pierre Fenaux, MD11, Maria Diez-Campelo, MD, PhD12*, Rami S. Komrokji, MD13, Guillermo Garcia-Manero, MD14, Torsten Haferlach, MD15, Shahram Kordasti, MD, PhD16,17, Amer M. Zeidan, MD18, Gastone Castellani, PhD19* and Matteo Giovanni Della Porta, MD1,4*

1IRCCS Humanitas Research Hospital, Rozzano, Milan, Italy
2Train s.r.l., Rozzano, Milan, Italy
3IRCCS Humanitas Research Hospital, Rozzano, Milano, Italy
4Department of Biomedical Sciences, Humanitas University, Pieve Emanuele, Milan, Italy
5IRCCS Humanitas Research Hospital, Houston, TX
6IRCCS Humanitas Research Hospital, Rozzano, Italy
7Struttura Complessa di Anatomia Patologica, Policlinico di Modena, Modena, Italy
8MDS UNIT, DMSC, Azienda Ospedaliero-Universitaria Careggi & University of Florence, Florence, Italy
9Myelodysplastic Syndromes Research Group, Institut De Recerca Josep Carreras, Badalona, Barcelona, Spain
10Medical Clinic and Policlinic 1, Hematology and Cellular Therapy, University Hospital Leipzig, Leipzig, Germany
11Département (DMU) d'hématologie et immunologie, Service d'hématologie séniors, AP-HP Hospital Saint-Louis, Paris, France
12Department of Hematology, Salamanca-IBSAL University Hospital, Salamanca, Spain
13Department of Malignant Hematology, Moffitt Cancer Center, Tampa, FL
14Department of Leukemia, The University of Texas MD Anderson Cancer Center, Houston, TX
15MLL Munich Leukemia Laboratory, Munich, Germany
16Hematology Unit, Department of Clinical and Molecular Sciences, Università Politecnica delle Marche, Ancona, Ancona, Italy
17Department of Clinical Haematology, Guy's and St Thomas' NHS Foundation Trust, London, United Kingdom
18Yale School of Medicine, Smilow Cancer Hospital at Yale New Haven, New Haven, CT
19Department of Medical and Surgical Sciences, University of Bologna, Bologna, Italy

Background. The innovation process in hematology requires access to a large amount of healthcare data. However, 97% of patient data produced by hospitals remains unused (Source: Deloitte, Health Data, 2023), primarily due to privacy limitations, lack of data harmonization from different sources, and the unstructured and dispersed nature of the information.

Large Language Models (LLM) are computational models capable of performing general-purpose language generation and other natural language processing tasks. These models acquire these abilities by learning statistical relationships from vast amounts of text through a computationally intensive self-supervised and semi-supervised training process. LLMs have been increasingly utilized in healthcare to enhance diagnostics, streamline patient interactions, and improve overall clinical workflows. In this project, we analyze the potential of Artificial Intelligence (AI) solutions based on LLM for data retrieval, extraction and generation to create standardized datasets to accelerate clinical and translational research in blood diseases in hematology.

Aims.
The “David vs Goliath” study was conducted by Synthema EU consortium with the following aims to: 1) develop AI solution leveraging LLM for information retrieval, extraction and generation of research-ready datasets from multiple medical sources; 2) evaluate clinical and statistical fidelity of AI-retrieved dataset through a specific Validation Framework (VF); 3) validate the reliability of AI-retrieved dataset to build personalized prognostic models.

Methods. We proposed ARISTOTELES an Automatic Retrieval Information System TO acceleraTE clinical and translationaL research in hEmatological malignancieS. This solution has three main components: a Retrieval-Augmented Generation (RAG) system for information retrieval; an LLM for data extraction and a Generative Pretrained Transformer (GPT) for missing data inference. The RAG component, also leveraging a dedicated LLM model, was implemented to search information across multiple data sources from the Humanitas Research Hospital DataLake (a fully privacy compliant environment) and to enhance the quality of retrieved information. The information provided by RAG was then extracted by a second hematological-tuned LLM model into a structured dataset in common data model format. Finally, the GPT model, trained on hematological data, was then used to generate complete data, conditioned on partially extracted patients’ information.

Results. The original dataset (Goliath) comprises 1167 patients with myeloid neoplasms (MN) from Humanitas Research Hospital, including multiple layers of information with comprehensive demographic, clinical and genomic data (cytogenetics and mutational screening) alongside treatment and outcome. The AI-retrieved dataset (David) was generated by applying ARISTOTELES on medical records from the same MN patients.

The comparison of the two datasets was performed by a specific validation framework (based on PMID:38875514). Distributions and correlations for clinical, demographic, genomic and cytogenetic in both datasets were comparable with 91% of fidelity. Mutation distribution and pairwise association among genes and/or cytogenetics abnormalities resulted in 90.1% of fidelity.

No significant statistical difference between the two datasets has been observed by comparing the survival curves with a Kaplan-Meier model with a log-rank test in patients stratified according to clinical labels. Finally, we performed Cox proportional hazards analyses (Cox-PH) including clinical and genomic information from the David vs Goliath datasets to compare their performance (concordance index, CI). Considering overall survival as a clinical endpoint the CI of Cox-PH models was 0.75 and 0.74 respectively.

Conclusion. ARISTOTELES solution allows automatic information retrieval, extraction and structuring of complex multimodal healthcare data. AI-retrieved data (David) resulted in high clinical and statistical fidelity with respect to the original dataset (Goliath). Overall, this results in increasing access to healthcare data and reducing human effort required for data collection tasks, thereby accelerating clinical research in hematology.

Disclosures: Santoro: Celgene: Speakers Bureau; Amgen: Speakers Bureau; Abb-vie: Speakers Bureau; Roche: Speakers Bureau; Takeda: Speakers Bureau; Astrazeneca: Speakers Bureau; Arqule: Speakers Bureau; Lilly: Speakers Bureau; Sandoz: Speakers Bureau; Novartis: Speakers Bureau; Beigene: Speakers Bureau; MSD: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Bayer: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; EISAI: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Pfizer: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Gilead: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Servier: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; BMS: Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Incyte: Consultancy; Sanofi: Consultancy. Santini: BMS/Celgene: Membership on an entity's Board of Directors or advisory committees; AbbVie: Membership on an entity's Board of Directors or advisory committees; CTI: Membership on an entity's Board of Directors or advisory committees; Geron: Membership on an entity's Board of Directors or advisory committees; Keros: Membership on an entity's Board of Directors or advisory committees; Jazz: Membership on an entity's Board of Directors or advisory committees; Novartis: Membership on an entity's Board of Directors or advisory committees; Servier: Membership on an entity's Board of Directors or advisory committees; Syros: Membership on an entity's Board of Directors or advisory committees. Platzbecker: Curis: Consultancy, Honoraria, Research Funding; Geron: Consultancy; Amgen: Consultancy, Research Funding; Abbvie: Consultancy, Research Funding; Janssen: Consultancy, Honoraria, Research Funding; Merck: Research Funding; MDS Foundation: Membership on an entity's Board of Directors or advisory committees; BMS: Consultancy, Membership on an entity's Board of Directors or advisory committees, Other: Travel support, Research Funding; Novartis: Consultancy, Research Funding. Fenaux: Astex: Research Funding; Servier: Research Funding; Agios: Research Funding; Novartis: Research Funding; Jazz Pharmaceuticals: Honoraria, Research Funding; Janssen: Research Funding; AbbVie: Honoraria, Research Funding; BMS: Honoraria, Research Funding. Diez-Campelo: ASTEX/OTSUKA: Membership on an entity's Board of Directors or advisory committees, Other: TRAVEL TO MEETINGS; CURIS: Membership on an entity's Board of Directors or advisory committees; SYROS: Membership on an entity's Board of Directors or advisory committees; HEMAVAN: Membership on an entity's Board of Directors or advisory committees; AGIOS: Consultancy, Membership on an entity's Board of Directors or advisory committees; BLUEPRINT MEDICINES: Consultancy, Membership on an entity's Board of Directors or advisory committees; KEROS: Honoraria, Membership on an entity's Board of Directors or advisory committees; Novartis: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees; GSK: Consultancy, Membership on an entity's Board of Directors or advisory committees; Gilead: Other: Travel reimbursement; BMS/Celgene: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Other: Advisory board fees. Komrokji: Celgene/BMS: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding; Servier: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Servio: Membership on an entity's Board of Directors or advisory committees; Servio: Honoraria; Genentech: Consultancy; Keros: Membership on an entity's Board of Directors or advisory committees; BMS: Research Funding; Novartis: Membership on an entity's Board of Directors or advisory committees; Geron: Consultancy, Membership on an entity's Board of Directors or advisory committees; Janssen: Consultancy; AbbVie: Consultancy, Membership on an entity's Board of Directors or advisory committees; DSI: Consultancy, Membership on an entity's Board of Directors or advisory committees; Jazz Pharmaceuticals: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Rigel: Consultancy, Honoraria, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Sobi: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Sumitomo Pharma: Consultancy, Membership on an entity's Board of Directors or advisory committees; Taiho: Membership on an entity's Board of Directors or advisory committees; CTI biopharma: Membership on an entity's Board of Directors or advisory committees; DSI: Honoraria, Membership on an entity's Board of Directors or advisory committees; BMS: Honoraria, Membership on an entity's Board of Directors or advisory committees; PharmaEssentia: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau. Garcia-Manero: Onconova: Research Funding; H3 Biomedicine: Research Funding; Astex: Other: Personal fees; Bristol Myers Squibb: Other: Personal fees, Research Funding; Genentech: Research Funding; AbbVie: Research Funding; Novartis: Research Funding; Helsinn: Research Funding; Forty Seven: Research Funding; Aprea: Research Funding; Janssen: Research Funding; Curis: Research Funding; Merck: Research Funding; Helsinn: Other: Personal fees; Astex: Research Funding; Amphivena: Research Funding; Genentech: Other: Personal fees. Kordasti: API: Consultancy; Boston Biomed: Consultancy; Novartis: Consultancy, Honoraria, Research Funding, Speakers Bureau; Celgene: Research Funding; Alexion: Consultancy; Beckman Coulter: Speakers Bureau; MorphoSys: Research Funding; Pfizer: Consultancy, Speakers Bureau. Della Porta: Bristol Myers Squibb: Consultancy.

*signifies non-member of ASH