Ohlson Tomas, Wallner Björn, Elofsson Arne
Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden.
Proteins. 2004 Oct 1;57(1):188-97. doi: 10.1002/prot.20184.
To improve the detection of related proteins, it is often useful to include evolutionary information for both the query and target proteins. One method to include this information is by the use of profile-profile alignments, where a profile from the query protein is compared with the profiles from the target proteins. Profile-profile alignments can be implemented in several fundamentally different ways. The similarity between two positions can be calculated using a dot-product, a probabilistic model, or an information theoretical measure. Here, we present a large-scale comparison of different profile-profile alignment methods. We show that the profile-profile methods perform at least 30% better than standard sequence-profile methods both in their ability to recognize superfamily-related proteins and in the quality of the obtained alignments. Although the performance of all methods is quite similar, profile-profile methods that use a probabilistic scoring function have an advantage as they can create good alignments and show a good fold recognition capacity using the same gap-penalties, while the other methods need to use different parameters to obtain comparable performances.
为了提高相关蛋白质的检测能力,纳入查询蛋白和目标蛋白的进化信息通常很有用。纳入此信息的一种方法是使用profile-profile比对,即将查询蛋白的profile与目标蛋白的profile进行比较。Profile-profile比对可以通过几种根本不同的方式实现。两个位置之间的相似性可以使用点积、概率模型或信息理论度量来计算。在此,我们对不同的profile-profile比对方法进行了大规模比较。我们表明,profile-profile方法在识别超家族相关蛋白的能力以及所获得比对的质量方面,比标准序列-profile方法至少高出30%。尽管所有方法的性能相当相似,但使用概率评分函数的profile-profile方法具有优势,因为它们可以在使用相同空位罚分的情况下创建良好的比对并显示出良好的折叠识别能力,而其他方法需要使用不同的参数才能获得可比的性能。