Department of Genetics, Evolution and Environment, University College London, United Kingdom.
Mol Biol Evol. 2011 Mar;28(3):1217-28. doi: 10.1093/molbev/msq303. Epub 2010 Nov 18.
The branch-site test is a likelihood ratio test to detect positive selection along prespecified lineages on a phylogeny that affects only a subset of codons in a protein-coding gene, with positive selection indicated by accelerated nonsynonymous substitutions (with ω = d(N)/d(S) > 1). This test may have more power than earlier methods, which average nucleotide substitution rates over sites in the protein and/or over branches on the tree. However, a few recent studies questioned the statistical basis of the test and claimed that the test generated too many false positives. In this paper, we examine the null distribution of the test and conduct a computer simulation to examine the false-positive rate and the power of the test. The results suggest that the asymptotic theory is reliable for typical data sets, and indeed in our simulations, the large-sample null distribution was reliable with as few as 20-50 codons in the alignment. We examined the impact of sequence length, the strength of positive selection, and the proportion of sites under positive selection on the power of the branch-site test. We found that the test was far more powerful in detecting episodic positive selection than branch-based tests, which average substitution rates over all codons in the gene and thus miss the signal when most codons are under strong selective constraint. Recent claims of statistical problems with the branch-site test are due to misinterpretations of simulation results. Our results, as well as previous simulation studies that have demonstrated the robustness of the test, suggest that the branch-site test may be a useful tool for detecting episodic positive selection and for generating biological hypotheses for mutation studies and functional analyses. The test is sensitive to sequence and alignment errors and caution should be exercised concerning its use when data quality is in doubt.
分支位点检验是一种似然比检验,用于检测系统发育树上特定谱系中影响蛋白质编码基因中部分密码子的正选择,正选择表现为加速非同义替换(ω=d(N)/d(S)>1)。与在蛋白质中对位点或在树的分支上平均核苷酸替换率的早期方法相比,该检验可能具有更高的功效。然而,最近的一些研究质疑了该检验的统计基础,并声称该检验产生了过多的假阳性。在本文中,我们检验了检验的零假设分布,并进行了计算机模拟,以检验假阳性率和检验的功效。结果表明,渐近理论对于典型数据集是可靠的,实际上,在我们的模拟中,大样本零假设分布在比对中具有 20-50 个密码子时就可靠。我们研究了序列长度、正选择的强度以及受正选择影响的位点比例对分支位点检验功效的影响。我们发现,与基于分支的检验相比,该检验在检测突发性正选择方面具有更强的功效,因为基于分支的检验平均了基因中所有密码子的替换率,从而忽略了大多数密码子受到强烈选择约束时的信号。最近关于分支位点检验统计问题的说法是由于对模拟结果的误解所致。我们的结果以及之前的模拟研究已经证明了该检验的稳健性,表明分支位点检验可能是检测突发性正选择和为突变研究和功能分析生成生物学假设的有用工具。该检验对序列和比对错误很敏感,当数据质量存在疑问时,应谨慎使用该检验。