Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
Department of Genome Sciences, and University of Washington, Seattle, Washington, USA.
J Comput Biol. 2022 Aug;29(8):802-824. doi: 10.1089/cmb.2021.0644. Epub 2022 Jul 1.
Although the rates at which positions in the genome mutate are known to depend not only on the nucleotide to be mutated, but also on neighboring nucleotides, it remains challenging to do phylogenetic inference using models of context-dependent mutation. In these models, the effects of one mutation may in principle propagate to faraway locations, making it difficult to compute exact likelihoods. This article shows how to use bounds on the propagation of dependency to compute likelihoods of mutation of a given segment of genome by marginalizing over sufficiently long flanking sequence. This can be used for maximum likelihood or Bayesian inference. Protocols examining residuals and iterative model refinement are also discussed. Tools for efficiently working with these models are provided in an R package, which could be used in other applications. The method is used to examine context dependence of mutations since the common ancestor of humans and chimpanzee.
虽然已知基因组中位置的突变率不仅取决于要突变的核苷酸,还取决于相邻的核苷酸,但使用基于上下文的突变模型进行系统发育推断仍然具有挑战性。在这些模型中,一个突变的影响原则上可以传播到很远的位置,从而使得精确计算似然变得困难。本文展示了如何使用依赖性传播的界来计算给定基因组片段突变的似然,方法是通过对足够长的侧翼序列进行边缘化。这可用于最大似然或贝叶斯推断。还讨论了检查残差和迭代模型改进的协议。用于有效地使用这些模型的工具在一个 R 包中提供,该包可用于其他应用。该方法用于检查人类和黑猩猩共同祖先以来的突变的上下文依赖性。