Chen Qipian, Yang Hao, Feng Xiao, Chen Qingjian, Shi Suhua, Wu Chung-I, He Ziwen
State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Sun Yat-sen University, Guangzhou 510275, China.
Natl Sci Rev. 2021 Dec 3;9(5):nwab217. doi: 10.1093/nsr/nwab217. eCollection 2022 May.
There has been a large literature in the last two decades affirming adaptive DNA sequence evolution between species. The main lines of evidence are from (i) the McDonald-Kreitman (MK) test, which compares divergence and polymorphism data, and (ii) the phylogenetic analysis by maximum likelihood (PAML) test, which analyzes multispecies divergence data. Here, we apply these two tests concurrently to genomic data of and . To our surprise, the >100 genes identified by the two tests do not overlap beyond random expectation. Because the non-concordance could be due to low powers leading to high false negatives, we merge every 20-30 genes into a 'supergene'. At the supergene level, the power of detection is large but the calls still do not overlap. We rule out methodological reasons for the non-concordance. In particular, extensive simulations fail to find scenarios whereby positive selection can only be detected by either MK or PAML, but not both. Since molecular evolution is governed by positive and negative selection concurrently, a fundamental assumption for estimating one of these (say, positive selection) is that the other is constant. However, in a broad survey of primates, birds, and , we found that negative selection rarely stays constant for long in evolution. As a consequence, the variation in negative selection is often misconstrued as a signal of positive selection. In conclusion, MK, PAML and any method that examines genomic sequence evolution has to explicitly address the variation in negative selection before estimating positive selection. In a companion study, we propose a possible path forward in two stages-first, by mapping out the changes in negative selection and then using this map to estimate positive selection. For now, the large literature on positive selection between species has to await reassessment.
在过去二十年里,有大量文献证实了物种间适应性DNA序列进化。主要证据来自:(i)麦克唐纳-克里特曼(MK)检验,该检验比较分歧和多态性数据;(ii)最大似然法系统发育分析(PAML)检验,该检验分析多物种分歧数据。在此,我们同时将这两种检验应用于[物种名称1]和[物种名称2]的基因组数据。令我们惊讶的是,两种检验所鉴定出的100多个基因除了随机预期外并不重叠。由于这种不一致可能是由于低效能导致高假阴性,我们将每20 - 30个基因合并为一个“超级基因”。在超级基因层面,检测效能很大,但检测结果仍然不重叠。我们排除了导致这种不一致的方法学原因。特别是,广泛的模拟未能找到仅能通过MK或PAML其中之一而非两者同时检测到正选择的情况。由于分子进化同时受正选择和负选择支配,估计其中之一(比如正选择)的一个基本假设是另一个保持恒定。然而,在对灵长类动物、鸟类、[物种名称3]和[物种名称4]的广泛调查中,我们发现负选择在进化过程中很少能长时间保持恒定。因此,负选择的变化常常被误解为正选择的信号。总之,MK、PAML以及任何检测基因组序列进化的方法在估计正选择之前都必须明确考虑负选择的变化。在一项配套研究中,我们提出了一个可能分两个阶段的前进方向——首先,描绘出负选择的变化,然后利用这一图谱来估计正选择。目前,关于物种间正选择的大量文献有待重新评估。