Smith Martin D, Wertheim Joel O, Weaver Steven, Murrell Ben, Scheffler Konrad, Kosakovsky Pond Sergei L
Graduate Program in Bioinformatics and Systems Biology, University of California San Diego.
Department of Medicine, University of California San Diego.
Mol Biol Evol. 2015 May;32(5):1342-53. doi: 10.1093/molbev/msv022. Epub 2015 Feb 19.
Over the past two decades, comparative sequence analysis using codon-substitution models has been honed into a powerful and popular approach for detecting signatures of natural selection from molecular data. A substantial body of work has focused on developing a class of "branch-site" models which permit selective pressures on sequences, quantified by the ω ratio, to vary among both codon sites and individual branches in the phylogeny. We develop and present a method in this class, adaptive branch-site random effects likelihood (aBSREL), whose key innovation is variable parametric complexity chosen with an information theoretic criterion. By applying models of different complexity to different branches in the phylogeny, aBSREL delivers statistical performance matching or exceeding best-in-class existing approaches, while running an order of magnitude faster. Based on simulated data analysis, we offer guidelines for what extent and strength of diversifying positive selection can be detected reliably and suggest that there is a natural limit on the optimal parametric complexity for "branch-site" models. An aBSREL analysis of 8,893 Euteleostomes gene alignments demonstrates that over 80% of branches in typical gene phylogenies can be adequately modeled with a single ω ratio model, that is, current models are unnecessarily complicated. However, there are a relatively small number of key branches, whose identities are derived from the data using a model selection procedure, for which it is essential to accurately model evolutionary complexity.
在过去二十年中,使用密码子替换模型的比较序列分析已发展成为一种强大且流行的方法,用于从分子数据中检测自然选择的特征。大量工作集中在开发一类“分支位点”模型,该模型允许通过ω比率量化的序列选择压力在系统发育中的密码子位点和各个分支之间变化。我们开发并提出了该类别的一种方法,即适应性分支位点随机效应似然法(aBSREL),其关键创新在于使用信息论标准选择可变的参数复杂性。通过将不同复杂性的模型应用于系统发育中的不同分支,aBSREL在运行速度快一个数量级的同时,提供了与同类最佳现有方法相匹配或超越的统计性能。基于模拟数据分析,我们给出了关于能够可靠检测多样化正选择的程度和强度的指导方针,并表明“分支位点”模型的最佳参数复杂性存在自然限制。对8893个真骨鱼类基因比对进行的aBSREL分析表明,典型基因系统发育中超过80%的分支可以用单一ω比率模型进行充分建模,也就是说,当前模型过于复杂。然而,存在相对较少的关键分支,其身份通过模型选择程序从数据中得出,对于这些分支,准确建模进化复杂性至关重要。