Self-Supervised Learning Based Characterization of Bone Marrow Biopsies for Enhanced Diagnosis of Hematologic Disorders

Yoon, Dan

Background

Morphologic evaluation of bone marrow biopsies (BMBx) to diagnose hematologic malignancies and disorders can be challenging due to limitations in cytologic details. With the increasing interest of applying AI in pathology, significant focus has been placed on utilizing machine learning to enhance diagnostic precision. Many of these studies have focused on distinguishing diseases within the same or limited diagnostic categories, but it is unclear if these methods can identify disease processes across a diversity of hematologic disorders solely based on bone marrow histology alone. To identify and characterize unique patterns in hematologic diseases, we utilized a self-supervised learning (SSL) approach to analyze H&E stained BMBx images from patients with various hematologic disorders as well as their morphologic mimics and negative controls.

Methods

BMBx slide images were segmented into 112px tiles at 20x magnification to capture detailed morphological features. SSL was employed using the Barlow Twins architecture, which effectively learns critical pathological features present within each slide. The learned features were then clustered using the Leiden algorithm and visualized with Uniform Manifold Approximation and Projection (UMAP) to identify distinct disease-specific clusters. Based on these characteristic clusters, logistic regression models were developed to classify the diseases, including disease subtypes within myelodysplastic syndrome (MDS) and myeloproliferative neoplasms (MPN) categories (e.g. MDS-EB-1 vs MDS-EB-2 vs MDS with low blasts, and polycythemia vera (PV) vs chronic myeloid leukemia (CML) vs primary myelofibrosis (PMF) vs essential thrombocythemia (ET)). The dataset consisted of 159 sample slides of MDS (89 samples), 99 MPN slides (53 samples), 108 age-matched cytopenic controls (55 samples), and 188 negative lymphoma staging biopsies (100 samples). The analysis was conducted on two levels: one including the main disease categories (negative lymphoma controls, cytopenic controls, MDS, and MPN) and the other including the subtypes within MDS and MPN. The ensemble method was then applied to the logistic regression models trained on the cluster compositions for each analysis, assigning optimal weights to these models to leverage distinct characteristic features between diseases and within subtypes. A 5-fold cross-validation was performed to evaluate the robustness and accuracy of the disease classification.

Results

The SSL-based analysis identified 51 clusters for the primary disease data and 49 clusters for the data including subtypes. Clusters with histopathologically significant morphological features, and these clusters were enriched or depleted in specific categories; some were shared among multiple diseases. These clusters included fat spaces in cytopenia, dense lymphocytes in lymphoma, megakaryocytes in MDS, and overproduction of specific blood cell types in MPN, with fibrosis particularly observed in MPN-PMF. Additionally, a comparison of clusters enriched in MDS-EB-1 and MDS-EB-2 versus those enriched in MDS with low blasts showed differences in cell density. Leveraging these distinct feature clusters, the logistic regression models demonstrated high diagnostic accuracy. Classification of individual disease categories against all other disease categories yielded Area Under the Receiver Operating Characteristic (AUROC) scores of 0.91 for MPN, 0.83 for MDS, 0.88 for negative lymphoma controls, and 0.81 for cytopenic controls. Notably, subtype classification performance within the MDS and MPN categories showed high accuracy, with all subtype classifications achieving an AUROC of 0.89 or higher with the models effectively distinguishing MDS-EB-1 and MDS-EB-2 from MDS with low blasts, and MPN-PMF vs other MPN subtypes.

Conclusion

This study demonstrates the effectiveness of AI-assisted analysis using self-supervised learning in diagnosing hematologic disorders based on BMBx morphologic features alone. By characterizing distinct patterns from slide images, the approach successfully distinguishes among multiple disease categories and subtypes, highlighting the method's ability to accurately differentiate disease subtypes among chronic myeloid neoplasms. The high AUROC scores achieved highlight the potential of machine learning in aiding pathologists diagnose hematologic diseases.

2234 Self-Supervised Learning Based Characterization of Bone Marrow Biopsies for Enhanced Diagnosis of Hematologic Disorders