Independent, International Validation and Refinement of a Machine Learning Algorithm to Classify Acute Leukemia Using Routine Laboratory Features

Turki, Amin

Oral and Poster Abstracts
Oral
903. Health Services and Quality Improvement: Myeloid Malignancies: Innovative Approaches to Improve Quality of Care, Affordability, and Outcomes

Lymphoid Leukemias, ALL, Acute Myeloid Malignancies, AML, Artificial intelligence (AI), Research, Clinical Practice (Health Services and Quality), Clinical Research, Diversity, Equity, and Inclusion (DEI), Diseases, Lymphoid Malignancies, Myeloid Malignancies, Technology and Procedures, Machine learning

Amin T. Turki, MD^1,2^*, Alberto Hernández Sánchez, MD³^*, Wellington Silva⁴, Magdalena Karasek, MD⁵^*, Luca Guarnera, MD⁶, Koray Yalçin, MD⁷^*, Amir Enshaei⁸^*, Marta Sobas, MD⁹^*, Dirk Reinhardt, MD¹⁰, Maria M Rivas, MD¹¹^*, Deepak Kumar Mishra, MD¹²^*, Eduardo Rego¹³^*, Ahmet Koc¹⁴^*, Paola Núñez Medina¹⁵^*, Maria Teresa Voso, MD¹⁶, Anthony Moorman¹⁷^*, Felix Nensa, MD¹⁸^* and Merlin Engelke¹⁹^*

¹Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany
²Department of Hematology and Oncology, Marienhospital, Ruhr-University Bochum, Bochum, Germany
³Hematology Department, Hospital Universitario de Salamanca (CAUSA/IBSAL), Salamanca, Spain
⁴University of Sao Paulo, Faculdade De Medicina USP, Sao Paulo, BRA
⁵Department of Hematology, Blood Neoplasms and Bone Marrow Transplantation, Wroclaw Medical University, Wroclaw, Poland
⁶Tor Vergata University, Rome, Italy
⁷Bahcesehir University Medical Park Göztepe Hospital, Istanbul, Turkey, Istanbul, Turkey
⁸Wolfson Childhood Cancer Research Centre, Newcastle University, Newcastle, United Kingdom
⁹Department of Hematology, Blood Neoplasms and Bone Marrow Transplantation, Medical University of Wroclaw, Wroclaw, Poland
¹⁰University Children’s Hospital Essen. Department of Pediatric Hematology and Oncology, Essen, Germany
¹¹Hospital Universitario Austral, Buenos Aires, Argentina
¹²Laboratory Hematology, Cytogenetics & Molecular Pathology, Tata Medical Center, Kolkata, West Bengal, IND
¹³Hospital das Clinicas da Faculdade de Medicina da Universidade de Sao Paulo, Sao Paulo, Brazil
¹⁴Department of Pediatric Hematology and Oncology, Marmara University Faculty of Medicine, Istanbul, Turkey
¹⁵Department of Hematology, University of Salamanca, Salamanca, Spain
¹⁶Department of Biomedicine and Prevention, University of Rome Tor Vergata, Rome, Italy
¹⁷Leukaemia Research Cytogenetics Group, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
¹⁸Institute for AI in Medicine, University Hospital Essen, Essen, Germany
¹⁹Department of AI in Medicine, University Hospital Essen, Essen, DEU

The timely diagnosis of acute leukemias (AL) can be a challenge under constrained conditions. Patients in particular in low- and mid-income countries, suffer from various access barriers to specialized diagnosis. Delays in diagnosis and referral, especially for patients with acute promyelocytic leukemia (APL), increase early mortality (Rego Blood 2013, Odetola and Tallman ASH Educ Program 2023). Most recently, routine laboratory features have been leveraged to develop and test machine learning (ML) classification algorithms for predicting AL types on multicenter French cohorts (Alcazer, Lancet Digital Health, 2024). Yet, its global generalizability has not been extensively tested.

Methods:

To test these algorithms, we assembled a multicenter retrospective cohort of patients with diagnosed AL from 9 countries, whose laboratory features (total leukocytes, monocyte and lymphocyte counts, platelets, MCV, MCHC, LDH, fibrinogen, prothrombin activity in %, age) were obtained at the earliest timepoint of leukemia diagnosis at hospital contact. The cohort was inclusive of ethnic, social, and age diversity (range 0.08 – 97 years), included both sexes (female 42.7%), adult (≥ 18 years, n=1025) and pediatric patients (n=1771). The top-performing model in the development cohort, an extreme gradient boosting (XGB) model, was employed for testing. A Python package was developed that provides data preparation through HL7/FHIR or csv tables, predictions using an embedded R script, and evaluation using Weights & Biases. The model was run separately for each site to account for cohort heterogeneity. Missing features cutoff was 20%. Feature importance was analyzed by determining SHapley Additive exPlanations(SHAP) values. Misclassified patients were further analyzed regarding their features’ clinical significance and by statistical, machine-learning and dimensionality reduction methods. This study was approved by the ethics committee of the University of Duisburg-Essen (N°24-11882-BO)

Results:

In 2796 patients with diagnosed AL, the previously published “confident” predictions of the algorithm reached peak median AUROC of up to 99.7 for APL, 98.8 for acute myeloid leukemia (AML) and 98.8 for acute lymphoblastic leukemia (ALL). High scorings with “confident” predictions were obtained from Europe (e.g. F1 score AML 0.97 [95%CI, 0.972-0.973]), Asia (e.g. ALL F1 score 0.94 [95%CI, 0.937-0.943]) and Latin America (e.g. AML F1 0.98 [95%CI, 0.976-0.978]). “Confident” predictions, however, were only available for 41-5% of patients depending on cohorts. The accuracy “base” prediction of AL varied across sites and countries. ML predicted APL at median AUROC between 0.98 and 0.79 and other types of AML with median AUROC between 0.87 and 0.60. The best “base” algorithm performance was recorded for AML and APL with the data from Salamanca, indicating some feature dependencies of the algorithm.

In the pediatric subsets, ALL was the most frequently diagnosed leukemia, and cohorts reached a median AUROC of 0.78 (range 0.65-0.78), similar to adult ALL. However, the algorithm – originally developed on adult cohorts - did not generalize well for pediatric AML, its F1 scores (range 0.40-0.32) were lower than in pediatric ALL (range 0.72-0.68). We examined potential algorithm limitations, e.g., misclassified patients, to identify sources of bias. Higher proportions of missing values reduced the precision of the predictions, reason why we refined its cutoff. The most important features in SHAP analysis were prothrombin activity and monocyte count across predictions, for ALL also LDH, for AML MCV and age and for APL predictions fibrinogen and MCHC. Misclassified AML patients were predicted as ALL when having low monocyte counts or missing this feature. Few AML patients with impaired coagulation (e.g. PT <60) and normal leukocytes were misclassified as APL. Misclassified ALL patients with high monocyte counts, with higher MCV, and with lower LDH, were predicted as AML. We adjusted the scripts for limitations and statistical outliers to improve the algorithm’s applicability in clinical practice.

Conclusion:

Inclusive ML tools can reduce access barriers in hematology. This first international validation of an ML tool to support the diagnosis of AL provides important insight into its validity and practical use. Validating the model on more patients and countries will further inform its generalizability.

Disclosures: Turki: Biomarin, AMGEN: Speakers Bureau; Onkowissen.tv: Speakers Bureau; CSL Behring: Consultancy; Pfizer: Consultancy; Janssen: Other: Travel reimbursements; Neovii: Other: Travel reimbursements; Maat Pharma: Consultancy; Novartis: Other: Travel reimbursements. Reinhardt: Medac, BMS, Immedica: Research Funding. Voso: Novartis: Other: Research support, Speakers Bureau; Celgene/BMS: Other: Research support, Advisory Board, Speakers Bureau; Syros: Other: Advisory Board; Astra Zeneca: Speakers Bureau; Abbvie: Speakers Bureau; Jazz: Other: Advisory Board, Speakers Bureau; Astellas: Speakers Bureau. Nensa: Siemens Healthineers: Research Funding.

See more of: 903. Health Services and Quality Improvement: Myeloid Malignancies: Innovative Approaches to Improve Quality of Care, Affordability, and Outcomes
See more of: Oral and Poster Abstracts

<< Previous Abstract | Next Abstract >>

^*signifies non-member of ASH

790 Independent, International Validation and Refinement of a Machine Learning Algorithm to Classify Acute Leukemia Using Routine Laboratory Features