College of Computer and Information Engineering, Tianjin Normal University, Tianjin, 300387, China.
Department of Biological and Biomedical Sciences, Rowan University, NJ, 08028, USA.
Adv Sci (Weinh). 2024 Aug;11(29):e2308934. doi: 10.1002/advs.202308934. Epub 2024 May 22.
Numerous single-cell transcriptomic datasets from identical tissues or cell lines are generated from different laboratories or single-cell RNA sequencing (scRNA-seq) protocols. The denoising of these datasets to eliminate batch effects is crucial for data integration, ensuring accurate interpretation and comprehensive analysis of biological questions. Although many scRNA-seq data integration methods exist, most are inefficient and/or not conducive to downstream analysis. Here, DeepBID, a novel deep learning-based method for batch effect correction, non-linear dimensionality reduction, embedding, and cell clustering concurrently, is introduced. DeepBID utilizes a negative binomial-based autoencoder with dual Kullback-Leibler divergence loss functions, aligning cell points from different batches within a consistent low-dimensional latent space and progressively mitigating batch effects through iterative clustering. Extensive validation on multiple-batch scRNA-seq datasets demonstrates that DeepBID surpasses existing tools in removing batch effects and achieving superior clustering accuracy. When integrating multiple scRNA-seq datasets from patients with Alzheimer's disease, DeepBID significantly improves cell clustering, effectively annotating unidentified cells, and detecting cell-specific differentially expressed genes.
大量来自不同实验室或单细胞 RNA 测序 (scRNA-seq) 方案的相同组织或细胞系的单细胞转录组数据集被生成。对这些数据集进行去噪以消除批次效应对于数据集成至关重要,可确保对生物学问题进行准确的解释和全面的分析。尽管存在许多 scRNA-seq 数据集成方法,但大多数方法效率低下且/或不利于下游分析。在这里,引入了一种新的基于深度学习的方法 DeepBID,该方法可同时进行批量效应校正、非线性降维、嵌入和细胞聚类。DeepBID 使用基于负二项式的自动编码器和双 Kullback-Leibler 散度损失函数,在一致的低维潜在空间中对齐来自不同批次的细胞点,并通过迭代聚类逐步减轻批量效应。在多个批量 scRNA-seq 数据集上的广泛验证表明,DeepBID 在消除批量效应和实现更高的聚类准确性方面优于现有工具。在整合来自阿尔茨海默病患者的多个 scRNA-seq 数据集时,DeepBID 显著改善了细胞聚类,有效地注释了未识别的细胞,并检测到细胞特异性差异表达基因。