Session: 618. Acute Myeloid Leukemias: Biomarkers and Molecular Markers in Diagnosis and Prognosis: Poster I
Hematology Disease Topics & Pathways:
Research, Translational Research
A key challenge in applying machine learning to karyotype analysis is the relative scarcity of annotated data, which can limit model performance. To address this, we developed a novel pretraining strategy that leverages the abundance of normal karyotype images. Specifically, we first trained our model on a chromosome classification task using a vast dataset of normal human chromosomes, enabling it to learn fundamental chromosome features. This pretrained model was then fine-tuned using our curated dataset of karyotypes annotated for a variety of chromosomal abnormalities. This transfer learning approach proved remarkably effective, yielding high accuracy despite the limited aberration data.
Using a training set of ~10,000 patient specimens and ~50,000 karyograms from over 5 years (2016-2020) of clinical data, we created a labeled set of images representing individual chromosomes. These individual chromosomes were used to train and assess deep learning models for classifying the 24 human chromosomes and identifying chromosomal aberrations. Among multiple machine learning models evaluated, the top-accuracy models for both chromosome identification and aberration detection task utilized the recently introduced Topological Vision Transformers (TopViTs) with 2-level-block-Toeplitz masking, to incorporate structural inductive bias.
On the baseline task of chromosome identification, our transformer-based models outperformed CNN (Inception) models with >99.3% accuracy. When applied to disease aberration detection, these high-performing architectures exhibited accuracies >99% for most aberrations. We tested the model on a diverse set of chromosome aberrations (an intra-chromosomal unbalanced abnormality del(5q); intra-chromosomal balanced rearrangements inv(3) and inv(16), and inter-chromosomal translocations t(9;22), t(9;11), and t(11:19)) commonly seen in acute myeloid leukemia (AML), chronic myeloid leukemia, and myelodysplastic syndromes (MDS). Notably, we were able to show high-quality performance even in “few shot” learning scenarios, with limited examples of true aberrations. Incorporating the definition of clonality substantially improved both precision and recall (sensitivity). Furthermore, our attempt to identify aberrant chromosomes de novo showed precision-recall performances comparable to fine tuning across all aberrations. In particular, del(5q) and t(9;22) returned perfect accuracy, while inv(3) and t(11;19)) showed 100% precision with >90% recall.
To evaluate the generalizability of our aberration detection models, we used an entirely independent validation set derived from patient samples clinically tested between 2021 and 2022. Across all models and aberrations, we had high precision and recall (100% precision and recall in most instances when considering specimen-level detection). The de novo performance on the 2021-2022 dataset matched that of the 2016-2020 dataset across all aberrations. This is reinforced by the clear separation between normal and aberrant chromosomes seen in UMAP projections for our most frequent aberrations, t(9;22) and del5q.
This is the first study demonstrating the ability of a karyotype AI model to accurately detect chromosome aberration approaching expert-level performance. Our assembled dataset, spanning seven years of clinical data and encompassing 6,319 unique patients, represents one of the largest resources for karyotype machine learning. These results open up exciting opportunities for precision oncology, not only expediting patient results but providing a scalable technology for early detection of minimal disease or subclonal lesions. The ability to analyze hundreds of metaphases per specimen would increase the sensitivity of the assay and reveal further details about the clonal architecture of these diseases.
Disclosures: Shamsi: Google: Current Employment, Current holder of stock options in a privately-held company. Bryant: Google Research: Current Employment, Current holder of stock options in a privately-held company. Dubey: Google Research: Current Employment, Current holder of stock options in a privately-held company. Kothari: Google Research: Current Employment, Current holder of stock options in a privately-held company. Dehghani: Google Research: Current Employment, Current holder of stock options in a privately-held company. Chavarha: Google Research: Ended employment in the past 24 months. Likhosherstov: Google Research: Current Employment. Williams: Google Research: Current Employment, Current holder of stock options in a privately-held company. Frumkin: Google Research: Current Employment, Current holder of stock options in a privately-held company. Choromanski: Google Research: Current Employment, Current holder of stock options in a privately-held company. Bashir: Google Research: Current holder of stock options in a privately-held company, Ended employment in the past 24 months.