Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur, India.
Department of Biological Sciences and Bioengineering, Indian Institute of Technology Kanpur, Kanpur, India.
Nat Commun. 2023 Nov 27;14(1):7781. doi: 10.1038/s41467-023-43590-8.
Integration of heterogeneous single-cell sequencing datasets generated across multiple tissue locations, time, and conditions is essential for a comprehensive understanding of the cellular states and expression programs underlying complex biological systems. Here, we present scDREAMER ( https://github.com/Zafar-Lab/scDREAMER ), a data-integration framework that employs deep generative models and adversarial training for both unsupervised and supervised (scDREAMER-Sup) integration of multiple batches. Using six real benchmarking datasets, we demonstrate that scDREAMER can overcome critical challenges including skewed cell type distribution among batches, nested batch-effects, large number of batches and conservation of development trajectory across batches. Our experiments also show that scDREAMER and scDREAMER-Sup outperform state-of-the-art unsupervised and supervised integration methods respectively in batch-correction and conservation of biological variation. Using a 1 million cells dataset, we demonstrate that scDREAMER is scalable and can perform atlas-level cross-species (e.g., human and mouse) integration while being faster than other deep-learning-based methods.
整合来自多个组织位置、时间和条件的异质单细胞测序数据集对于全面了解复杂生物系统的细胞状态和表达程序至关重要。在这里,我们提出了 scDREAMER(https://github.com/Zafar-Lab/scDREAMER),这是一个数据集成框架,它使用深度生成模型和对抗训练来进行多个批次的无监督和有监督(scDREAMER-Sup)集成。使用六个真实的基准数据集,我们证明了 scDREAMER 可以克服关键挑战,包括批次之间细胞类型分布的偏斜、嵌套批次效应、大量批次以及批次之间发育轨迹的保留。我们的实验还表明,scDREAMER 和 scDREAMER-Sup 在批次校正和保留生物变异性方面分别优于最先进的无监督和有监督集成方法。使用一个 100 万个细胞的数据集,我们证明了 scDREAMER 是可扩展的,可以进行图谱级别的跨物种(例如,人类和小鼠)集成,并且比其他基于深度学习的方法更快。