Mittelman David, Sadreyev Ruslan, Grishin Nick
Howard Hughes Medical Institute Department of Biochemistry, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA.
Bioinformatics. 2003 Aug 12;19(12):1531-9. doi: 10.1093/bioinformatics/btg185.
The development of powerful automatic methods for the comparison of protein sequences has become increasingly important. Profile-to-profile comparisons allow for the use of broader information about protein families, resulting in more sensitive and accurate comparisons of distantly related sequences. A key part in the comparison of two profiles is the method for the calculation of scores for the position matches. A number of methods based on various theoretical considerations have been proposed. We implemented several previously reported scoring functions as well as our own functions, and compared them on the basis of their ability to produce accurate short ungapped alignments of a given length.
Our results suggest that the family of the probabilistic methods (log-odds based methods and prof_sim) may be the more appropriate choice for the generation of initial 'seeds' as the first step to produce local profile-profile alignments. The most effective scoring systems were the closely related modifications of functions previously implemented in the COMPASS and Picasso methods.
开发强大的蛋白质序列比较自动方法变得越来越重要。轮廓与轮廓的比较允许使用关于蛋白质家族的更广泛信息,从而对远缘相关序列进行更敏感和准确的比较。两个轮廓比较中的一个关键部分是计算位置匹配得分的方法。已经提出了许多基于各种理论考虑的方法。我们实现了几种先前报道的评分函数以及我们自己的函数,并根据它们产生给定长度的准确短无间隙比对的能力对它们进行了比较。
我们的结果表明,概率方法家族(基于对数似然的方法和prof_sim)可能是生成初始“种子”作为产生局部轮廓与轮廓比对第一步的更合适选择。最有效的评分系统是先前在COMPASS和Picasso方法中实现的函数的密切相关修改。