Division of Biostatistics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America.
Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, California, United States of America.
PLoS Comput Biol. 2021 Feb 18;17(2):e1007948. doi: 10.1371/journal.pcbi.1007948. eCollection 2021 Feb.
Gene function annotation is important for a variety of downstream analyses of genetic data. But experimental characterization of function remains costly and slow, making computational prediction an important endeavor. Phylogenetic approaches to prediction have been developed, but implementation of a practical Bayesian framework for parameter estimation remains an outstanding challenge. We have developed a computationally efficient model of evolution of gene annotations using phylogenies based on a Bayesian framework using Markov Chain Monte Carlo for parameter estimation. Unlike previous approaches, our method is able to estimate parameters over many different phylogenetic trees and functions. The resulting parameters agree with biological intuition, such as the increased probability of function change following gene duplication. The method performs well on leave-one-out cross-validation, and we further validated some of the predictions in the experimental scientific literature.
基因功能注释对于遗传数据分析的各种下游分析都很重要。但是功能的实验表征仍然昂贵且缓慢,因此计算预测是一项重要的工作。已经开发了用于预测的系统发育方法,但是为参数估计实施实用的贝叶斯框架仍然是一个悬而未决的挑战。我们使用基于贝叶斯框架的系统发育开发了一种计算上有效的基因注释进化模型,该模型使用马尔可夫链蒙特卡罗进行参数估计。与以前的方法不同,我们的方法能够在许多不同的系统发育树上和功能上估计参数。得到的参数与生物学直觉相符,例如基因复制后功能变化的可能性增加。该方法在留一法交叉验证中表现良好,我们进一步验证了实验科学文献中的一些预测。