Ahola Virpi, Aittokallio Tero, Vihinen Mauno, Uusipaikka Esa
Biotechnology and Food Research, MTT Agrifood Research Finland, FI-31600 Jokioinen, Finland.
Bioinformatics. 2008 Oct 1;24(19):2165-71. doi: 10.1093/bioinformatics/btn414. Epub 2008 Aug 4.
Multiple sequence alignment (MSA) is an essential prerequisite for many sequence analysis methods and valuable tool itself for describing relationships between protein sequences. Since the success of the sequence analysis is highly dependent on the reliability of alignments, measures for assessing the quality of alignments are highly requisite.
We present a statistical model-based alignment quality score. Unlike other quality scores, it does not require several parallel alignments for the same set of sequences or additional structural information. Our quality score is based on measuring the conservation level of reference alignments in Homstrad. Reference sequences were realigned with the Mafft, Muscle and Probcons alignment programs, and a sum-of-pairs (SP) score was used to measure the quality of the realignments. Statistical modelling of the SP score as a function of conservation level and other alignment characteristics makes it possible to predict the SP score for any global MSA. The predicted SP scores are highly correlated with the correct SP scores, when tested on the Homstrad and SABmark databases. The results are comparable to that of multiple overlap score (MOS) and better than those of normalized mean distance (NorMD) and normalized iRMSD (NiRMSD) alignment quality criteria. Furthermore, the predicted SP score is able to detect alignments with badly aligned or unrelated sequences.
The method is freely available at http://www.mtt.fi/AlignmentQuality/.
多序列比对(MSA)是许多序列分析方法的重要前提,其本身也是描述蛋白质序列间关系的重要工具。由于序列分析的成功高度依赖于比对的可靠性,因此评估比对质量的方法非常必要。
我们提出了一种基于统计模型的比对质量得分。与其他质量得分不同,它不需要对同一组序列进行多个并行比对或额外的结构信息。我们的质量得分基于测量Homstrad中参考比对的保守水平。参考序列使用Mafft、Muscle和Probcons比对程序重新进行比对,并使用双序列比对和(SP)得分来衡量重新比对的质量。将SP得分作为保守水平和其他比对特征的函数进行统计建模,使得能够预测任何全局多序列比对的SP得分。在Homstrad和SABmark数据库上进行测试时,预测的SP得分与正确的SP得分高度相关。结果与多重重叠得分(MOS)相当,且优于归一化平均距离(NorMD)和归一化iRMSD(NiRMSD)比对质量标准。此外,预测的SP得分能够检测出比对不佳或序列不相关的比对。