School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA.
Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia, USA.
Commun Biol. 2023 Mar 16;6(1):274. doi: 10.1038/s42003-023-04668-7.
Large-scale scRNA-seq studies typically generate data in batches, which often induce nontrivial batch effects that need to be corrected. Given the global efforts for building cell atlases and the increasing number of annotated scRNA-seq datasets accumulated, we propose a supervised strategy for scRNA-seq data integration called SIDA (Supervised Integration using Domain Adaptation), which uses the cell type annotations to guide the integration of diverse batches. The supervised strategy is based on domain adaptation that was initially proposed in the computer vision field. We demonstrate that SIDA is able to generate comprehensive reference datasets that lead to improved accuracy in automated cell type mapping analyses.
大规模单细胞 RNA 测序研究通常会分批生成数据,这往往会产生需要纠正的重要批次效应。鉴于全球构建细胞图谱的努力以及积累的越来越多的带注释单细胞 RNA 测序数据集,我们提出了一种称为 SIDA(使用领域自适应进行监督集成)的单细胞 RNA 测序数据集成的监督策略,该策略使用细胞类型注释来指导不同批次的集成。监督策略基于最初在计算机视觉领域提出的领域自适应。我们证明 SIDA 能够生成全面的参考数据集,从而提高自动细胞类型映射分析的准确性。