Campbell Kieran R, Yau Christopher
Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, OX1 3QX, UK.
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, OX3 7BN, UK.
Wellcome Open Res. 2017 Mar 15;2:19. doi: 10.12688/wellcomeopenres.11087.1.
Modeling bifurcations in single-cell transcriptomics data has become an increasingly popular field of research. Several methods have been proposed to infer bifurcation structure from such data, but all rely on heuristic non-probabilistic inference. Here we propose the first generative, fully probabilistic model for such inference based on a Bayesian hierarchical mixture of factor analyzers. Our model exhibits competitive performance on large datasets despite implementing full Markov-Chain Monte Carlo sampling, and its unique hierarchical prior structure enables automatic determination of genes driving the bifurcation process. We additionally propose an Empirical-Bayes like extension that deals with the high levels of zero-inflation in single-cell RNA-seq data and quantify when such models are useful. We apply or model to both real and simulated single-cell gene expression data and compare the results to existing pseudotime methods. Finally, we discuss both the merits and weaknesses of such a unified, probabilistic approach in the context practical bioinformatics analyses.
对单细胞转录组学数据中的分支进行建模已成为一个日益热门的研究领域。已经提出了几种方法来从此类数据中推断分支结构,但所有方法都依赖于启发式的非概率推断。在此,我们基于因子分析器的贝叶斯分层混合,提出了首个用于此类推断的生成式全概率模型。尽管我们的模型采用了全马尔可夫链蒙特卡罗采样,但在大型数据集上仍表现出具有竞争力的性能,并且其独特的分层先验结构能够自动确定驱动分支过程的基因。我们还提出了一种类似经验贝叶斯的扩展方法,该方法可处理单细胞RNA测序数据中的高度零膨胀问题,并量化此类模型何时有用。我们将我们的模型应用于真实和模拟的单细胞基因表达数据,并将结果与现有的伪时间方法进行比较。最后,我们在实际生物信息学分析的背景下讨论这种统一概率方法的优缺点。