Department of Genetics, Evolution and Environment, University College London, London, United Kingdom.
Mol Biol Evol. 2010 Oct;27(10):2257-67. doi: 10.1093/molbev/msq115. Epub 2010 May 5.
The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The "branch-site" test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-positive rates for a wide range of selection schemes. Previous simulations examining the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indel-simulation program to examine the false-positive rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examination of two previous studies suggests that alignment errors may impact the analysis of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.
检测影响蛋白质编码基因的正达尔文选择仍然是一个非常有趣和重要的话题。“分支位点”检验旨在检测仅影响特定谱系中少数氨基酸残基的局部突发正选择,并且已被证明对于广泛的选择方案具有合理的功效和低的假阳性率。然而,以前检查检验性能的模拟是在没有插入、缺失或对齐错误的理想化条件下进行的。由于该检验有时用于分析分歧序列,因此插入和缺失的影响是一个主要关注点。在这里,我们使用最近开发的插入缺失模拟程序来检查分支位点检验的假阳性率和功效。我们发现,如果对齐正确,插入和缺失不会导致过多的假阳性,但对齐错误可能导致不可接受的高假阳性。在所评估的对齐方法中,PRANK 始终优于 MUSCLE、MAFFT 和 ClustalW,主要是因为后两个程序倾向于将非同源密码子(或氨基酸)放入同一列,从而产生较短且较不准确的对齐,并错误地认为在这些位置发生了许多氨基酸替换。我们对以前的两项研究的检查表明,对齐错误可能会影响分支位点检验对哺乳动物和脊椎动物基因的分析,因此使用可靠的对齐方法非常重要。