Machine Learning Can Outperform Ann Arbor Staging in Predicting Survival in Patient with Diffuse Large B-Cell Lymphoma: Analysis of a Large National Cancer Database

Kumar, Madhan Srinivasan

Oral and Poster Abstracts
627. Aggressive Lymphomas: Clinical and Epidemiological: Poster III

artificial intelligence (AI), Technology and Procedures, machine learning

Madhan Srinivasan Kumar, MD¹, Veena Gujju, MD²^*, Ji Hwan Park, PhD³^*, Debra Hogue, M.S.³^*, Abdul Rafeh Naqash, MD⁴^* and Taha Mahdi Salih Al-Juhaishi, MD⁵

¹Internal Medicine, Saint Vincent Hospital, Worcester, MA
²Department of Medicine – Section of Hematology and Medical Oncology, Baylor College of Medicine, Houston, TX
³School of Computer Science, University of Oklahoma, Norman, OK
⁴Hematology and Medical Oncology, TSET Phase 1 Program, University of Oklahoma Health Sciences Center - Stephenson Cancer Center, Oklahoma City, OK
⁵Hematology and Medical Oncology, Stem Cell Transplantation and Cellular Therapy Program, University of Oklahoma Health Sciences Center - Stephenson Cancer Center, Oklahoma City, OK

Introduction:

Diffuse Large B-cell Lymphoma (DLBCL) is the most common lymphoma in the world with usually an aggressive clinical course. The Ann Arbor staging system and International Prognostic Index (IPI) commonly utilized in clinical practice for risk stratification have known limitations. Machine learning (ML) has emerged as a promising tool for more comprehensive and deeper data analysis. We sought to utilize the ability of ML to predict survival in DLBCL compared to Ann Arbor staging system using a large national database.

Methodology:

We employed the ML algorithm XGBoost on the National Cancer Institute’s Surveillance, Epidemiology and End Result (SEER) database to predict overall survival (OS) and the lymphoma specific survival (LSS). For prediction analysis, we transformed the survival labels into a simple Boolean format: "alive" represented as 0, “dead” as 1, and “dead (attributable to this cancer diagnosis)” also as 1. We utilized one-hot encoding to convert categorical features and variables into binary vectors. The data set was divided into two parts: training (80%) and test (20%). Further, we split the training set into the actual training set and validation set by using stratified 5-fold cross validation. Hyper-parameter optimization was done within the validation set. A broad range of attributes were utilized by the model for its prediction algorithm. To understand how each attribute contributes to predictions, we calculated its importance score in XGBoost.

Results:

A total of 64,912 patients with DLBCL were found and their data were extracted. The majority were Caucasian (78.9%) with a median age range of 60 to 69. The model was able to predict OS and LSS, with an area under the curve (AUC) of 0.89 and 0.75 (Figure 1), respectively. Factors selected by the model for survival prediction included presence or absence of B-symptoms, treatment status, and disease stage. For OS and LSS, the model found B symptoms to be the highest contributing factor with an importance score of 0.205 and 0.167, respectively. Other important factors incorporated by the model included age and stage IV for OS, and stage IV and clinically asymptomatic status for LSS. The least important factors were location of the primary lymphoma site and year of diagnosis (Table 1).

Conclusion:

Machine learning tools can help predict survival in patients with DLBCL and able to challenge current staging systems. Our results warrant validation in future prospective studies.

Disclosures: No relevant conflicts of interest to declare.

See more of: 627. Aggressive Lymphomas: Clinical and Epidemiological: Poster III
See more of: Oral and Poster Abstracts

<< Previous Abstract | Next Abstract

^*signifies non-member of ASH

4513 Machine Learning Can Outperform Ann Arbor Staging in Predicting Survival in Patient with Diffuse Large B-Cell Lymphoma: Analysis of a Large National Cancer Database