School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab400.
Single-cell technologies provide us new ways to profile transcriptomic landscape, chromatin accessibility, spatial expression patterns in heterogeneous tissues at the resolution of single cell. With enormous generated single-cell datasets, a key analytic challenge is to integrate these datasets to gain biological insights into cellular compositions. Here, we developed a domain-adversarial and variational approximation, DAVAE, which can integrate multiple single-cell datasets across samples, technologies and modalities with a single strategy. Besides, DAVAE can also integrate paired data of ATAC profile and transcriptome profile that are simultaneously measured from a same cell. With a mini-batch stochastic gradient descent strategy, it is scalable for large-scale data and can be accelerated by GPUs. Results on seven real data integration applications demonstrated the effectiveness and scalability of DAVAE in batch-effect removing, transfer learning and cell-type predictions for multiple single-cell datasets across samples, technologies and modalities. Availability: DAVAE has been implemented in a toolkit package "scbean" in the pypi repository, and the source code can be also freely accessible at https://github.com/jhu99/scbean. All our data and source code for reproducing the results of this paper can be accessible at https://github.com/jhu99/davae_paper.
单细胞技术为我们提供了新的方法来描绘异质组织中单细胞水平的转录组景观、染色质可及性和空间表达模式。随着大量单细胞数据集的产生,一个关键的分析挑战是整合这些数据集,以深入了解细胞组成。在这里,我们开发了一种域对抗和变分近似方法 DAVAE,它可以用一种单一的策略整合来自不同样本、技术和模态的多个单细胞数据集。此外,DAVAE 还可以整合同时从同一细胞中测量的 ATAC 图谱和转录组图谱的配对数据。通过小批量随机梯度下降策略,它可以扩展到大规模数据,并可以通过 GPU 加速。在七个真实数据集整合应用的结果中,展示了 DAVAE 在去除批次效应、跨样本、技术和模态的多个单细胞数据集的迁移学习和细胞类型预测方面的有效性和可扩展性。
DAVAE 已在 pypi 存储库中的“scbean”工具包中实现,其源代码也可在 https://github.com/jhu99/scbean 上免费访问。我们所有用于重现本文结果的数据和源代码都可以在 https://github.com/jhu99/davae_paper 上访问。