The Laurence H. Baker Center in Bioinformatics on Biological Statistics, Iowa State University, Ames, IA, 50011, USA.
Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA, 50011, USA.
J Mol Evol. 2022 Oct;90(5):342-351. doi: 10.1007/s00239-022-10064-2. Epub 2022 Aug 3.
One of the most popular measures in the analysis of protein sequence evolution is the ratio of nonsynonymous distance (d) to synonymous distance (d). Under the assumption that synonymous substitutions in the coding region are selectively neutral, the d/d ratio can be used to statistically detect the adaptive evolution (or purifying selection) if d/d > 1 (or d/d < 1) significantly. However, due to strong structural constraints and/or variable functional constraints imposed on amino acid sites, most encoding genes in most species have demonstrated d/d < 1. Consequently, the statistical power for testing d/d = 1 may be insufficient to distinguish between different selection modes. In this paper, we propose a more powerful test, called d/d-H, in which a new parameter H, a relative measure of rate variation among sites, was introduced. Given the condition of strong purifying selections at some sites, the d/d-H model predicts d/d = 1-H for neutral evolution, d/d < 1-H for nearly neutral selection, and d/d > 1-H for adaptive evolution. The potential of this new method for resolving the neutral-adaptive debates is illustrated by the protein sequence evolution in vertebrates, Drosophila and yeasts, as well as somatic cancer evolution (specialized as the C/C-H test).
在分析蛋白质序列进化时,最常用的方法之一是计算非同义距离(d)与同义距离(d)的比值。在编码区的同义替换被认为是选择性中性的假设下,如果 d/d > 1(或 d/d < 1)显著,则可以使用 d/d 比值来统计检测适应性进化(或纯化选择)。然而,由于对氨基酸位点施加了强烈的结构约束和/或可变的功能约束,大多数物种中的大多数编码基因都表现出 d/d < 1。因此,测试 d/d = 1 的统计能力可能不足以区分不同的选择模式。在本文中,我们提出了一种更强大的测试方法,称为 d/d-H,其中引入了一个新的参数 H,它是位点间速率变化的相对度量。在某些位点存在强烈纯化选择的条件下,d/d-H 模型预测中性进化时 d/d = 1-H,近中性选择时 d/d < 1-H,适应性进化时 d/d > 1-H。通过脊椎动物、果蝇和酵母的蛋白质序列进化以及体细胞癌症进化(专门作为 C/C-H 测试),说明了这种新方法在解决中性-适应性争议方面的潜力。