-Author name in bold denotes the presenting author
-Asterisk * with author name denotes a Non-ASH member
Clinically Relevant Abstract denotes an abstract that is clinically relevant.

PhD Trainee denotes that this is a recommended PHD Trainee Session.

Ticketed Session denotes that this is a ticketed session.

4034 Optimization of RH Genotyping from Whole Exon Sequencing Data By Machine Learning

Program: Oral and Poster Abstracts
Session: 401. Blood Transfusion: Poster III
Hematology Disease Topics & Pathways:
Research, Sickle Cell Disease, Translational Research, bioinformatics, Hemoglobinopathies, Diseases, Technology and Procedures, machine learning
Monday, December 11, 2023, 6:00 PM-8:00 PM

Ti-Cheng Chang1*, Jing Yu, MSc2*, Zhaoming Wang, PhD3*, Jane S Hankins, MD, MS4, Mitchell Weiss, MD, PhD1, Gang Wu, PhD1*, Connie M. Westhoff, PhD5*, Sunitha Vege6*, Stella Chou, MD7* and Yan Zheng, MD, PhD8*

1St. Jude Children's Research Hospital, Memphis, TN
2Pathology, St. Jude Children’s Research Hospital, Memphis
3Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN
4Departments of Hematology and Global Pediatric Medicine, St. Jude Children’s Research Hospital, Memphis, TN
5New York Blood, New York, NY
6New York Blood Center, Long Island City, NY
7The Children’s Hospital of Philadelphia, Philadelphia
8Pathology, St. Jude Children’s Research Hospital, Memphis, TN


Repetitive exposures to donor’s red blood cell (RBC) antigens result in alloimmunization. This problem occurs at especially high rates in patients with sickle cell disease (SCD), in part because Black individuals frequently harbor numerous genetic variations in RHD and RHCE genes. The RH genetic variations can result in loss of common epitopes or expression of neo-epitopes, predisposing patients with Rh variants to Rh alloimmunization. Since these variants are not distinguishable by standard serological typing, RH genotyping to facilitate genotype-matched transfusion has become necessary. RH genotyping by DNA sequencing-based approaches is complicated by highly homologous sequences shared by RHD and RHCE. We previously developed the RHtyper, an automated system to ascertain complex RH genotypes of Black individuals from standard whole-genome sequencing (WGS) data (Chang, T.C., et al., Blood Advances, 2020). In a validation cohort of 57 SCD patients, RHtyper achieved 100% accuracy for RHD and 98.2% accuracy for RHCE genotypes compared to the genotypes obtained from single nucleotide variant (SNV)-based BeadChip and targeted molecular assays, which represent the current standard. Since whole-exome sequencing (WES) is more cost-effective and widely available than WGS, we optimized RHtyper for analyzing WES data by incorporating a machine learning approach that minimized errors in genotyping caused by non-uniform sequencing coverage and misalignment of sequencing reads in WES data.


WES and WGS data from 396 SCD patients enrolled in the Sickle Cell Clinical Research and Intervention Program (SCCRIP) study and 3030 childhood cancer survivors enrolled in St. Jude Lifetime Cohort Study (SJLIFE, with 15.2% Blacks) were included. RHtyper was optimized for WES data by using machine learning to improve prediction accuracy for RHD zygosity/ hybrid alleles, RHCE*C/RHCE*c alleles, zygosity of RHD c.1136C>T and RHCE c.48G>C. Specifically, hundreds to thousands of informative features specific for each of the allele/SNVs were selected by Boruta algorithm and incorporated into a prediction model using XGboost. The model was trained by 75% of the SCCRIP data, followed by validation using the remaining 25% of SCCRIP data. WGS-based genotypes served as references. The optimized RHtyper was further validated in the SJLIFE cohort.


Genotyping RH using WES data with the original RHtyper was less accurate. For 396 patients from the SCCRIP study, the concordance between WGS and WES data was 90.2% for RHD and 96.3% for RHCE. It was particular problematic in determining 1) RHD zygosity and hybrid alleles, 2) RHCE*C vs. RHCE*c alleles, 3) RHD c.1136C>T zygosity, 4) RHCE c.48G>C zygosity. We optimized RHtyper by incorporating machine learning specific for those affected alleles/SNVs, and substantially improved the concordance between WGS- and WES-based genotypes to 97.2% for RHD and 98.2% for RHCE. We further validated the optimized RHtyper using 3030 patients from the SJLIFE cohort and achieved concordance of 96.3% for RHD, 94.6% for RHCE. The predicted C antigen frequency per WES data was 59.0% for Whites and 24.2% for Blacks for the SJLIFE cohort, similar to previously reported racial distributions. In addition, for 1036 patients with blood type records, the predicted D serologic types using WES data were 99.8% consistent with clinical serology results.


We improved RHtyper for WES by integrating machine learning, which allowed for incorporation of information from a large number of diverse informative features, enabling more accurate predication.

Disclosures: Weiss: GlaxoSmithKline: Consultancy; bluebird bio: Consultancy; Novartis Inc.: Consultancy; Dyne: Consultancy; Vertex Pharmaceuticals: Consultancy; Cellarity: Consultancy, Current equity holder in private company; Graphite Bio: Consultancy; Forma Therapeutics: Consultancy.

*signifies non-member of ASH