European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, United Kingdom.
Mol Biol Evol. 2012 Apr;29(4):1125-39. doi: 10.1093/molbev/msr272. Epub 2011 Nov 1.
When detecting positive selection in proteins, the prevalence of errors resulting from misalignment and the ability of alignment filters to mitigate such errors are not well understood, but filters are commonly applied to try to avoid false positive results. Focusing on the sitewise detection of positive selection across a wide range of divergence levels and indel rates, we performed simulation experiments to quantify the false positives and false negatives introduced by alignment error and the ability of alignment filters to improve performance. We found that some aligners led to many false positives, whereas others resulted in very few. False negatives were a problem for all aligners, increasing with sequence divergence. Of the aligners tested, PRANK's codon-based alignments consistently performed the best and ClustalW performed the worst. Of the filters tested, GUIDANCE performed the best and Gblocks performed the worst. Although some filters showed good ability to reduce the error rates from ClustalW and MAFFT alignments, none were found to substantially improve the performance of PRANK alignments under most conditions. Our results revealed distinct trends in error rates and power levels for aligners and filters within a biologically plausible parameter space. With the best aligner, a low false positive rate was maintained even with extremely divergent indel-prone sequences. Controls using the true alignment and an optimal filtering method suggested that performance improvements could be gained by improving aligners or filters to reduce the prevalence of false negatives, especially at higher divergence levels and indel rates.
在检测蛋白质中的正选择时,由于不对齐导致的错误的普遍性以及对齐过滤器减轻这些错误的能力尚不清楚,但通常会应用过滤器来避免假阳性结果。我们专注于在广泛的分歧水平和插入缺失率范围内对正选择进行逐点检测,进行了模拟实验来量化对齐错误引入的假阳性和假阴性以及对齐过滤器提高性能的能力。我们发现,一些对齐器导致了许多假阳性,而其他对齐器则导致很少的假阳性。假阴性是所有对齐器的一个问题,随着序列分歧的增加而增加。在所测试的对齐器中,PRANK 的基于密码子的对齐始终表现最好,ClustalW 表现最差。在所测试的过滤器中,GUIDANCE 表现最好,Gblocks 表现最差。虽然一些过滤器显示出了很好的能力,可以降低 ClustalW 和 MAFFT 对齐的错误率,但在大多数情况下,没有发现它们可以显著提高 PRANK 对齐的性能。我们的结果在生物上合理的参数空间内揭示了对齐器和过滤器的错误率和功率水平的明显趋势。使用最佳对齐器,即使对于极度分歧的插入缺失倾向序列,也可以保持低的假阳性率。使用真实对齐和最佳过滤方法的对照表明,可以通过改进对齐器或过滤器来减少假阴性的普遍性来提高性能,尤其是在更高的分歧水平和插入缺失率下。