University Health Network, Toronto, Canada.
Department of Statistical Sciences, University of Toronto, Toronto, Canada.
Genome Biol. 2022 Apr 20;23(1):102. doi: 10.1186/s13059-022-02659-1.
Integrative analysis of large-scale single-cell RNA sequencing (scRNA-seq) datasets can aggregate complementary biological information from different datasets. However, most existing methods fail to efficiently integrate multiple large-scale scRNA-seq datasets. We propose OCAT, One Cell At a Time, a machine learning method that sparsely encodes single-cell gene expression to integrate data from multiple sources without highly variable gene selection or explicit batch effect correction. We demonstrate that OCAT efficiently integrates multiple scRNA-seq datasets and achieves the state-of-the-art performance in cell type clustering, especially in challenging scenarios of non-overlapping cell types. In addition, OCAT can efficaciously facilitate a variety of downstream analyses.
对大规模单细胞 RNA 测序 (scRNA-seq) 数据集进行综合分析可以从不同的数据集聚合互补的生物学信息。然而,大多数现有的方法都无法有效地整合多个大规模 scRNA-seq 数据集。我们提出了 OCAT,即一次一个细胞,这是一种机器学习方法,它稀疏地编码单细胞基因表达,无需进行高度可变的基因选择或显式批次效应校正,即可整合来自多个来源的数据。我们证明了 OCAT 能够有效地整合多个 scRNA-seq 数据集,并在细胞类型聚类方面实现了最先进的性能,尤其是在非重叠细胞类型的挑战性场景中。此外,OCAT 还可以有效地促进各种下游分析。