Sunyaev Shamil R, Bogopolsky Gennady A, Oleynikova Natalia V, Vlasov Peter K, Finkelstein Alexei V, Roytberg Mikhail A
Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
Proteins. 2004 Feb 15;54(3):569-82. doi: 10.1002/prot.10503.
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.
蛋白质序列比对是大多数用于预测蛋白质功能以及基于同源性的三维(3D)结构建模的计算方法中的关键步骤。我们研究了3D蛋白质结构的“金标准”比对与Smith-Waterman算法产生的序列比对之间的对应关系,Smith-Waterman算法是目前用于成对序列比对最灵敏的方法。该分析结果促成了一种用于比对一对蛋白质序列的新方法的开发。对Smith-Waterman比对和结构比对的比较聚焦于它们的内部结构,尤其是连续的无间隙比对片段,即间隙之间的“岛”。在金标准比对中,约三分之一的“岛”得分呈负或低正,且对它们的识别低于Smith-Waterman算法的灵敏度极限。从比对准确性的角度来看,算法在这些无法比对区域工作所花费的时间是不必要的。我们考虑了导致这种现象的标准相似性评分函数的特征,并提出了一种替代的分层算法,该算法明确针对高得分区域。该算法比Smith-Waterman算法快得多,而产生的比对相对于金标准而言平均质量相同。这一发现表明,比对准确性的降低不一定是以计算效率为代价的。