School of Computer Science and Engineering at the Sun Yat-sen University, China.
Sun Yat-sen Memorial Hospital at the Sun Yat-sen University, China.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab281.
In single cell analyses, cell types are conventionally identified based on expressions of known marker genes, whose identifications are time-consuming and irreproducible. To solve this issue, many supervised approaches have been developed to identify cell types based on the rapid accumulation of public datasets. However, these approaches are sensitive to batch effects or biological variations since the data distributions are different in cross-platforms or species predictions. In this study, we developed scAdapt, a virtual adversarial domain adaptation network, to transfer cell labels between datasets with batch effects. scAdapt used both the labeled source and unlabeled target data to train an enhanced classifier and aligned the labeled source centroids and pseudo-labeled target centroids to generate a joint embedding. The scAdapt was demonstrated to outperform existing methods for classification in simulated, cross-platforms, cross-species, spatial transcriptomic and COVID-19 immune datasets. Further quantitative evaluations and visualizations for the aligned embeddings confirm the superiority in cell mixing and the ability to preserve discriminative cluster structure present in the original datasets.
在单细胞分析中,细胞类型通常基于已知标记基因的表达来确定,而这些标记基因的鉴定既耗时又不可重复。为了解决这个问题,已经开发了许多有监督的方法,以便基于公共数据集的快速积累来识别细胞类型。然而,由于跨平台或物种预测的数据分布不同,这些方法对批次效应或生物变异很敏感。在这项研究中,我们开发了 scAdapt,这是一种虚拟对抗域自适应网络,可在具有批次效应的数据集之间转移细胞标签。scAdapt 使用有标签的源数据和无标签的目标数据来训练增强分类器,并对齐有标签的源质心和伪标签的目标质心,以生成联合嵌入。scAdapt 在模拟、跨平台、跨物种、空间转录组和 COVID-19 免疫数据集的分类中表现优于现有方法。对齐嵌入的进一步定量评估和可视化确认了在细胞混合方面的优势,以及在原始数据集中保留有判别聚类结构的能力。