Li Kevin, Chen Rachel, Lindsey William, Best Aaron, DeJongh Matthew, Henry Christopher, Tintle Nathan
Department of Mathematics, Columbia University, New York, NY 10027, USA,
Pac Symp Biocomput. 2019;24:172-183.
The rapid acceleration of microbial genome sequencing increases opportunities to understand bacterial gene function. Unfortunately, only a small proportion of genes have been studied. Recently, TnSeq has been proposed as a cost-effective, highly reliable approach to predict gene functions as a response to changes in a cell's fitness before-after genomic changes. However, major questions remain about how to best determine whether an observed quantitative change in fitness represents a meaningful change. To address the limitation, we develop a Gaussian mixture model framework for classifying gene function from TnSeq experiments. In order to implement the mixture model, we present the Expectation-Maximization algorithm and a hierarchical Bayesian model sampled using Stan's Hamiltonian Monte-Carlo sampler. We compare these implementations against the frequentist method used in current TnSeq literature. From simulations and real data produced by E.coli TnSeq experiments, we show that the Bayesian implementation of the Gaussian mixture framework provides the most consistent classification results.
微生物基因组测序的快速加速增加了了解细菌基因功能的机会。不幸的是,只有一小部分基因得到了研究。最近,TnSeq被提议作为一种经济高效、高度可靠的方法,用于预测基因功能,以应对基因组变化前后细胞适应性的变化。然而,关于如何最好地确定观察到的适应性定量变化是否代表有意义的变化,仍然存在重大问题。为了解决这一局限性,我们开发了一个高斯混合模型框架,用于从TnSeq实验中分类基因功能。为了实现混合模型,我们提出了期望最大化算法和一个使用Stan的哈密顿蒙特卡罗采样器采样的分层贝叶斯模型。我们将这些实现与当前TnSeq文献中使用的频率主义方法进行了比较。从大肠杆菌TnSeq实验产生的模拟数据和实际数据中,我们表明高斯混合框架的贝叶斯实现提供了最一致的分类结果。