-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

4811 Validation of Argo (Automatic record generator for Onco-Hematology), a New App Supporting the Automatic Conversion of Paper-Based Pathology Reports in Standardized Ecrfs

Program: Oral and Poster Abstracts
Session: 803. Emerging Tools, Techniques and Artificial Intelligence in Hematology: Poster III
Hematology Disease Topics & Pathways:
Lymphomas, Diseases, Lymphoid Malignancies, Technology and Procedures, machine learning, natural language processing
Monday, December 12, 2022, 6:00 PM-8:00 PM

Gian Maria Zaccaria, PhD1*, Francesco Berloco, MSc2*, Felice Clemente, MSc3*, Anita Susanna Pappagallo, MSc1*, Maria Carmela Vegliante, PhD1*, Grazia Gargano, MSc1*, Paolo Mondelli, MSc1*, Giacomo Volpe, PhD1*, Antonella Bucci, MSc1*, Tetiana Skrypets, MD, PhD1*, Carla Minoia, PhD1*, Angela Maria Quinto, MD1*, Giacomo Loseto, MD3*, Bernardo Rossini, MD1*, Fabio Pavone, PhD1*, Anna Scattone, MD4*, Giuseppe Carella, MBA5*, Vito Angiulli, PhD5*, Chiara Pagani, MD6*, Alice Di Rocco, MD7*, Francesca Maria Quaglia8*, Valentina Tabanelli, PhD9*, Angelo Fama, MD10*, Benedetta Puccini, MD11*, Riccardo Moia, MD12*, Simone Ferrero, MD13,14, Luigi Alfredo Grieco, PhD2*, Simona Colucci, PhD2*, Attilio Guarini, MD1* and Sabino Ciavarella, PhD1*

1Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
2Department of Electrical and Information Engineering, Polytechnic of Bari, Bari, Italy
3Hematology Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
4Pathology Department, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
5Technology Transfer Office, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
6Department of Hematology, ASST Spedali Civili, Brescia, Italy
7Unit of Hematology, Azienda Ospedaliero-Universitaria Policlinico Umberto I, Roma, Italy
8Department of Medicine, Section of Hematology, University of Verona, Verona, Italy
9Division of Diagnostic Haematopathology, IEO Istituto Europeo di Oncologia IRCCS, Milano, Italy
10Hematology, AUSL/IRCCS di Reggio Emilia, Reggio Emilia, Italy
11Careggi University Hospital, Firenze, Italy
12Division of Hematology, Azienda Ospedaliero-Universitaria Maggiore della Carità di Novara, Novara, Italy
13Division of Hematology , AOU Città della Salute e della Scienza di Torino, Torino, Italy
14Department of Molecular Biotechnologies and Health Sciences, Division of Hematology, University of Torino, Torino, Italy

Background and aims. The scarce accessibility to integrated systems limits the advantage of using real-world medical data for translational research purposes. ARGO (Automatic Report Generation for Onco-hematology) converts paper-based pathology reports in electronic Case Report Forms (eCRFs) exploiting Optical Character Recognition and Natural Language Processing technologies [Zaccaria et al., Sci. Rep., 2021].

Here, we present for the first time the App version of ARGO, designed to support physicians and data entries in rapidly generating eCRFs in a standardized and filtrable fashion. For scalability purposes, we tested ARGO App by processing n. 501 pathology reports from eight Italian Centers, including Hodgkin (HL), Diffuse Large B-Cell (DLBCL), Follicular (FCL), Mantle Cell (MCL), and T-Cell (TCL) lymphoma diagnoses.

Methods. Validation involved six expert hematologists who generated eCRFs by simply acquiring photographs of each paper-based report using commercially available camera-equipped smartphones (Apple® iPhones, IOS version 15). The set included n. 347 and n. 154 reports from IRCCS Istituto Tumori ‘Giovanni Paolo II’ (internal series, IS) and seven Italian cooperative Centers (external series, ES), respectively. Overall, they comprised n. 139 HL, n. 154 DLBCL, n. 109 FL, n. 76 MCL, n. 6 TCL, n. 17 unclassified describing major immunohistochemistry markers (IMs) as MYC, BCL2, BCL6, CD10, CD20, Cyclin D1, CD79a, CD15, CD30, PAX5, CD5, CD3, and Ki-67 proliferation index. The automatic process of diagnosis assignment by the ARGO algorithm (developed in Python) was imposed to depend on the highest matching rate between the detected IMs and corresponding classification as from the National Institute of Health in accordance with the International Classification of Diseases, 10th (ICD-10) and oncology (ICD-O) versions. To overcome potential misdiagnosis risk, a Random Forest (RF) model was trained on the IMs set of the IS, tested on the ES, and combined with ARGO algorithm. The performance of the App was assessed for accuracy and F1-score, which is a more sensitive metric.

Results. The ARGO App includes two use-cases at prospectically acquiring and retrospectively reading reports by users (physicians and/or data-managers). The first use-case allows users to acquire pathology reports via mobile phone’s camera. The second use-case leads to search patients’ data filtering by “Report ID”, “Name”, “Surname” and “Type of diagnosis”. For each new record, ARGO converts information about patients’ demography, diagnosis, tissue of origin of samples (lymph-node, extra-nodal, bone marrow, and peripheral blood), and IMs expressions.

ARGO successfully converted 490 (97.8%) reports into structured eCRFs (overall, n. 18,816 data). In terms of accuracy (Fig. 1A), MYC, Cyclin D1, CD79a, CD15, EMA, BCL2 (by fluorescent in situ hybridization), IgD, IgM, EBV, Ki-67 detection achieved among 85.7% and 99.4% in both series. BCL2, BCL6, CD10, CD20, CD30, PAX5, CD23, CD5, and CD45/LCA achieved among 65.1% and 85.6% for the IS and among 71.0% and 90.9% for the ES. MUM1 and CD3 achieved 56.2% and 71.4%, and 72.0% and 79.9% for IS and ES, respectively. Concerning the F1-score (Fig. 1B), although no significant differences were observed between the two series, on average, biomarkers gave a score that was lower of 20.2% for IS and 35.3% for ES compared to accuracy. Interestingly, Ki-67 proliferation index, MYC, CD10, CD20, Cyclin D1, CD79a, CD15, CD23, and CD5 achieved among 73.5% and 85.9% for the IS and among 72.4% and 85.5% for the ES.

The capturing of diagnosis achieved 87.3% and 82.5% of accuracy, and 87.3% and 83.0% of F1-score for IS and ES, respectively. Focusing on individual diagnoses in the ES, HL, MCL, DLBCL, FCL, and TCL reached 90.0%, 88.5%, 84.9%, 76.5%, 33.3%, respectively. Of these n. 154 reports, n. 52 (34%) were detected by the sole ARGO algorithm, n. 28 (18%) by RF, n. 67 (43%) by a combination of both, while n. 7 (5%) remained unclassified.

Conclusions. We validated ARGO App that robustly converts paper-based pathology reports of major lymphoma subtypes into structured eCRFs. ARGO is feasible and easily transferable into the daily practice to generate standardized patients’ clinical records for clinical and translational research purposes. Ongoing efforts are aiming at enlarging the TCL cohort of pathology reports and developing a multilanguage version for other languages than Italian.

Disclosures: Puccini: Beigene: Membership on an entity's Board of Directors or advisory committees; Takeda: Membership on an entity's Board of Directors or advisory committees. Ferrero: Gentili: Speakers Bureau; Gilead: Research Funding; Morphosys: Research Funding; Incyte: Membership on an entity's Board of Directors or advisory committees; EUSA Pharma: Consultancy, Membership on an entity's Board of Directors or advisory committees, Speakers Bureau; Jannsen: Consultancy, Membership on an entity's Board of Directors or advisory committees, Research Funding, Speakers Bureau; Clinigen: Membership on an entity's Board of Directors or advisory committees; Servier: Honoraria, Speakers Bureau.

*signifies non-member of ASH