Type: Oral
Session: 803. Emerging Tools, Techniques and Artificial Intelligence in Hematology: Reading the Blood: Generative and Discriminative AI in Hematology
Hematology Disease Topics & Pathways:
Research, Translational Research, bioinformatics, computational biology, emerging technologies, Technology and Procedures, profiling, omics technologies
Method: Here, we present the implementation of BinaryClust2 to three example MC datasets: In-house dataset from peripheral blood mononuclear cells (PBMCs) of 9 baseline myeloproliferative neoplasm (MPN) patients (~4 million cells), previously published data from 11 MPN patients receiving influenza vaccine (~0.2million cells), and published dataset of 59 covid patients and 23 healthy donors (~2million cells). BinaryClust2 has a streamlined analytical workflow, which comprises the following steps (Figure 1A):
Step 1: Quality control, batch effect evaluation and correction
Data quality control is a separate step before downstream analysis which includes diagnostic plots and batch effect examination. Algorithms CytofRUV and CytoNorm are available in the pipeline to remove unwanted variations caused by batch effects.
Step 2: Semi-supervised identification of main cell types
BinaryClust2 adopts a knowledge-based semi-supervised approach to predict main cell types. Users are required to provide a simple marker expression matrix of pre-defined cell types along with fcs files and metadata to construct a SingleCellExperiment (SCE) object, then the embedded algorithm can automatically classify cell populations without manual annotation.
Step 3: In-depth interrogation and differential testing
Specific population can be further extracted from whole cells and subject to in-depth exploration using unsupervised algorithms for subpopulation discovery. Dimensionality reduction tools UMAP and TSNE, unsupervised clustering methods Phenograph and flowSOM, and various data visualization plotting functions are offered in BinaryClust2. For statistical analysis, multiple study group comparison (n>2) of cell abundance and functional marker expression is supported via Kruskal Wallis test with multiple testing correction and post hoc analysis.
Results: The performance of the semi-supervised classification function was tested independently in the MPN and influenza PBMC datasets, 7 main cell lineages were identified with accuracy comparable to manual gating by human experts (Figure 1B): average F-measure reached 0.93 and 0.98 respectively. Moreover, taking manual gating as ground truth reference, BinaryClust2 outperformed the unsupervised approach flowSOM concerning accuracy (F-measure: 0.93 vs. 0.70) and speed (140s vs 339s) in MPN dataset, while remaining equivalent to the well-performing semi-supervised approach LDA in accuracy (F-measure: 0.93 vs. 0.93) but faster in runtime (140s vs 595s), as shown in Table 1. Application to covid-19 dataset by Chevrier et al. achieved reproducible results and additional discoveries. 13 main cell types were characterised, abundance of B cells, Basophils, cDCs, DN T cells, Monocytes, Neutrophils, NK cells, pDCs, CD8 T cells obtained statistical significance (all P<0.05) among study conditions (healthy, mild covid, severe covid). We also grouped markers reflecting functional status of immune cells and found Granzyme B expression was significantly increased in the majority of main immune cells of covid patients. Phenograph was further applied in neutrophils and monocytes and returned 14 and 16 subsets respectively.
Conclusion: Overall, BinaryClust2 incorporates expert’s prior biological knowledge in a semi-supervised fashion to accurately deconvolute well-defined main cell lineages, while also preserving the potential of unsupervised approaches to discover novel cell subsets and providing a user-friendly toolset to remove the analytical barrier for high-dimensional immune profiling.
Disclosures: Kordasti: Novartis: Honoraria, Membership on an entity's Board of Directors or advisory committees; MorphoSys: Research Funding; Beckman Coulter: Honoraria.