School of Mathematics and Statistics, University of Sydney, Sydney, NSW 2006, Australia.
Charles Perkins Centre, University of Sydney, Sydney, NSW 2006, Australia.
Proc Natl Acad Sci U S A. 2019 May 14;116(20):9775-9784. doi: 10.1073/pnas.1820006116. Epub 2019 Apr 26.
Concerted examination of multiple collections of single-cell RNA sequencing (RNA-seq) data promises further biological insights that cannot be uncovered with individual datasets. Here we present scMerge, an algorithm that integrates multiple single-cell RNA-seq datasets using factor analysis of stably expressed genes and pseudoreplicates across datasets. Using a large collection of public datasets, we benchmark scMerge against published methods and demonstrate that it consistently provides improved cell type separation by removing unwanted factors; scMerge can also enhance biological discovery through robust data integration, which we show through the inference of development trajectory in a liver dataset collection.
对多个单细胞 RNA 测序(RNA-seq)数据集进行协同检查有望进一步揭示单个数据集无法揭示的生物学见解。在这里,我们提出了 scMerge,这是一种使用稳定表达基因的因子分析和跨数据集的伪复制来整合多个单细胞 RNA-seq 数据集的算法。使用大量公共数据集,我们对 scMerge 与已发表方法进行了基准测试,结果表明它通过去除不需要的因素,始终能够提供更好的细胞类型分离效果;scMerge 还可以通过稳健的数据集成来增强生物学发现,我们通过对肝脏数据集集中发育轨迹的推断证明了这一点。