School of Computer Science and Engineering, Central South University, Changsha, 410083, China.
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac311.
Integration of single-cell transcriptome datasets from multiple sources plays an important role in investigating complex biological systems. The key to integration of transcriptome datasets is batch effect removal. Recent methods attempt to apply a contrastive learning strategy to correct batch effects. Despite their encouraging performance, the optimal contrastive learning framework for batch effect removal is still under exploration. We develop an improved contrastive learning-based batch correction framework, GLOBE. GLOBE defines adaptive translation transformations for each cell to guarantee the stability of approximating batch effects. To enhance the consistency of representations alignment, GLOBE utilizes a loss function that is both hardness-aware and consistency-aware to learn batch effect-invariant representations. Moreover, GLOBE computes batch-corrected gene matrix in a transparent approach to support diverse downstream analysis. Benchmarking results on a wide spectrum of datasets show that GLOBE outperforms other state-of-the-art methods in terms of robust batch mixing and superior conservation of biological signals. We further apply GLOBE to integrate two developing mouse neocortex datasets and show GLOBE succeeds in removing batch effects while preserving the contiguous structure of cells in raw data. Finally, a comprehensive study is conducted to validate the effectiveness of GLOBE.
整合来自多个来源的单细胞转录组数据集在研究复杂的生物系统中起着重要作用。转录组数据集整合的关键是去除批次效应。最近的方法试图应用对比学习策略来纠正批次效应。尽管它们的性能令人鼓舞,但去除批次效应的最佳对比学习框架仍在探索中。我们开发了一种改进的基于对比学习的批量校正框架 GLOBE。GLOBE 为每个细胞定义自适应翻译变换,以保证逼近批次效应的稳定性。为了增强表示对齐的一致性,GLOBE 利用一种既具有硬度意识又具有一致性意识的损失函数来学习批次效应不变的表示。此外,GLOBE 以透明的方式计算批量校正的基因矩阵,以支持各种下游分析。在广泛的数据集上的基准测试结果表明,GLOBE 在稳健的批量混合和优越的生物信号保留方面优于其他最先进的方法。我们进一步将 GLOBE 应用于整合两个正在发育的小鼠新皮层数据集,并表明 GLOBE 成功地去除了批次效应,同时保留了原始数据中细胞的连续结构。最后,进行了一项全面的研究来验证 GLOBE 的有效性。