Jia Cheng, Hu Yu, Kelly Derek, Kim Junhyong, Li Mingyao, Zhang Nancy R
Department of Biostatistics and Epidemiology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
Graduate Group in Genomics and Computational Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA.
Nucleic Acids Res. 2017 Nov 2;45(19):10978-10988. doi: 10.1093/nar/gkx754.
Recent technological breakthroughs have made it possible to measure RNA expression at the single-cell level, thus paving the way for exploring expression heterogeneity among individual cells. Current single-cell RNA sequencing (scRNA-seq) protocols are complex and introduce technical biases that vary across cells, which can bias downstream analysis without proper adjustment. To account for cell-to-cell technical differences, we propose a statistical framework, TASC (Toolkit for Analysis of Single Cell RNA-seq), an empirical Bayes approach to reliably model the cell-specific dropout rates and amplification bias by use of external RNA spike-ins. TASC incorporates the technical parameters, which reflect cell-to-cell batch effects, into a hierarchical mixture model to estimate the biological variance of a gene and detect differentially expressed genes. More importantly, TASC is able to adjust for covariates to further eliminate confounding that may originate from cell size and cell cycle differences. In simulation and real scRNA-seq data, TASC achieves accurate Type I error control and displays competitive sensitivity and improved robustness to batch effects in differential expression analysis, compared to existing methods. TASC is programmed to be computationally efficient, taking advantage of multi-threaded parallelization. We believe that TASC will provide a robust platform for researchers to leverage the power of scRNA-seq.
最近的技术突破使得在单细胞水平上测量RNA表达成为可能,从而为探索单个细胞之间的表达异质性铺平了道路。当前的单细胞RNA测序(scRNA-seq)方案很复杂,并且会引入因细胞而异的技术偏差,如果不进行适当调整,这些偏差可能会使下游分析产生偏差。为了解决细胞间的技术差异,我们提出了一个统计框架TASC(单细胞RNA-seq分析工具包),这是一种经验贝叶斯方法,通过使用外部RNA加标来可靠地模拟细胞特异性的缺失率和扩增偏差。TASC将反映细胞间批次效应的技术参数纳入分层混合模型,以估计基因的生物学方差并检测差异表达基因。更重要的是,TASC能够对协变量进行调整,以进一步消除可能源于细胞大小和细胞周期差异的混杂因素。在模拟和真实的scRNA-seq数据中,与现有方法相比,TASC在差异表达分析中实现了准确的I型错误控制,并显示出具有竞争力的灵敏度和对批次效应更强的稳健性。TASC经过编程,利用多线程并行化实现了计算效率。我们相信TASC将为研究人员利用scRNA-seq的强大功能提供一个稳健的平台。