Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia.
Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia.
Nat Commun. 2023 Jul 17;14(1):4272. doi: 10.1038/s41467-023-39923-2.
The recent emergence of multi-sample multi-condition single-cell multi-cohort studies allows researchers to investigate different cell states. The effective integration of multiple large-cohort studies promises biological insights into cells under different conditions that individual studies cannot provide. Here, we present scMerge2, a scalable algorithm that allows data integration of atlas-scale multi-sample multi-condition single-cell studies. We have generalized scMerge2 to enable the merging of millions of cells from single-cell studies generated by various single-cell technologies. Using a large COVID-19 data collection with over five million cells from 1000+ individuals, we demonstrate that scMerge2 enables multi-sample multi-condition scRNA-seq data integration from multiple cohorts and reveals signatures derived from cell-type expression that are more accurate in discriminating disease progression. Further, we demonstrate that scMerge2 can remove dataset variability in CyTOF, imaging mass cytometry and CITE-seq experiments, demonstrating its applicability to a broad spectrum of single-cell profiling technologies.
最近出现的多样本多条件单细胞多队列研究使研究人员能够研究不同的细胞状态。对多个大队列研究的有效整合有望深入了解个体研究无法提供的不同条件下的细胞,为此,我们提出了 scMerge2,这是一种可扩展的算法,允许对图谱规模的多样本多条件单细胞研究进行数据整合。我们已经将 scMerge2 推广到能够合并来自各种单细胞技术生成的单细胞研究中数百万个细胞。使用一个包含超过 500 万个细胞的大型 COVID-19 数据集,来自 1000 多个个体,我们证明 scMerge2 能够从多个队列中整合多样本多条件 scRNA-seq 数据,并揭示了来自细胞类型表达的特征,这些特征在区分疾病进展方面更准确。此外,我们证明 scMerge2 可以消除 CyTOF、成像质谱细胞术和 CITE-seq 实验中的数据集变异性,证明其适用于广泛的单细胞分析技术。