Tress Michael L, Jones David, Valencia Alfonso
Protein Design Group, Centro Nacional de Biotechnologia, CNB-CSIC, Cantoblanco, 28049 Madrid, Spain.
J Mol Biol. 2003 Jul 18;330(4):705-18. doi: 10.1016/s0022-2836(03)00622-3.
For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone. Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions. These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues. This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.
对于诸如比较建模等应用,一个主要问题是序列比对的可靠性。可以使用同一对序列的次优比对来预测比对中的可靠区域。在此我们表明,比对中的可靠区域也可以仅从多序列概况信息中预测出来。使用五种不同的测试方法为一组远缘相关的蛋白质对创建了比对。结构比对用于评估比对的质量,并且使用从为每个模板结构预先生成的序列概况中氨基酸残基的观察频率信息对比对位置进行评分。这些源自概况的比对分数的高分区域是可靠比对区域的良好预测指标。这些源自概况的比对分数易于获得,并且适用于任何比对方法。它们可用于检测比对中可靠比对的那些区域,并有助于预测比对的质量。对于二级结构元件内的那些残基,预测为可靠比对的区域与结构比对在92%至97.4%的残基上一致。在环区域中,预测为可靠的残基中略低于92%与结构比对一致。预测为可靠的残基百分比范围从螺旋残基的32.1%到链残基的52.8%。该信息还可用于帮助从序列比对中预测保守的结合位点。模板中被鉴定为结合位点、与相同氨基酸残基比对且序列比对与结构比对一致的残基,在超过80%的时间里处于高度保守的高分区域。这表明目标序列和模板序列中都存在的许多结合位点位于序列保守区域,并且有可能将可靠性转化为结合位点预测。