Mol Biol Evol. 2012 Jan;29(1):1-5. doi: 10.1093/molbev/msr177. Epub 2011 Jul 19.
Errors in the inferred multiple sequence alignment may lead to false prediction of positive selection. Recently, methods for detecting unreliable alignment regions were developed and were shown to accurately identify incorrectly aligned regions. While removing unreliable alignment regions is expected to increase the accuracy of positive selection inference, such filtering may also significantly decrease the power of the test, as positively selected regions are fast evolving, and those same regions are often those that are difficult to align. Here, we used realistic simulations that mimic sequence evolution of HIV-1 genes to test the hypothesis that the performance of positive selection inference using codon models can be improved by removing unreliable alignment regions. Our study shows that the benefit of removing unreliable regions exceeds the loss of power due to the removal of some of the true positively selected sites.
推断的多重序列比对中的错误可能导致阳性选择的错误预测。最近,开发了用于检测不可靠对齐区域的方法,并证明这些方法可以准确识别不正确对齐的区域。虽然去除不可靠的对齐区域预计会提高阳性选择推断的准确性,但这种过滤也可能显著降低测试的功效,因为阳性选择的区域进化迅速,而这些相同的区域往往是难以对齐的区域。在这里,我们使用模拟 HIV-1 基因序列进化的真实模拟来测试以下假设:通过去除不可靠的对齐区域,使用密码子模型进行阳性选择推断的性能可以得到改善。我们的研究表明,去除不可靠区域的好处超过了由于去除一些真正的阳性选择位点而导致的功效损失。