Poleksic Aleksandar, Fienup Mark
Department of Computer Science, University of Northern Iowa, Cedar Falls, IA 50614, USA.
Bioinformatics. 2008 May 1;24(9):1145-53. doi: 10.1093/bioinformatics/btn097. Epub 2008 Mar 12.
Profile-based protein homology detection algorithms are valuable tools in genome annotation and protein classification. By utilizing information present in the sequences of homologous proteins, profile-based methods are often able to detect extremely weak relationships between protein sequences, as evidenced by the large-scale benchmarking experiments such as CASP and LiveBench.
We study the relationship between the sensitivity of a profile-profile method and the size of the sequence profile, which is defined as the average number of different residue types observed at the profile's positions. We also demonstrate that improvements in the sensitivity of a profile-profile method can be made by incorporating a profile-dependent scoring scheme, such as position-specific background frequencies. The techniques presented in this article are implemented in an alignment algorithm UNI-FOLD. When tested against other well-established methods for fold recognition, UNI-FOLD shows increased sensitivity and specificity in detecting remote relationships between protein sequences.
UNI-FOLD web server can be accessed at http://blackhawk.cs.uni.edu
基于轮廓的蛋白质同源性检测算法是基因组注释和蛋白质分类中的重要工具。通过利用同源蛋白质序列中存在的信息,基于轮廓的方法通常能够检测到蛋白质序列之间极其微弱的关系,如CASP和LiveBench等大规模基准测试实验所证明的那样。
我们研究了轮廓-轮廓方法的灵敏度与序列轮廓大小之间的关系,序列轮廓大小定义为在轮廓位置观察到的不同残基类型的平均数量。我们还证明,通过纳入依赖于轮廓的评分方案,如位置特异性背景频率,可以提高轮廓-轮廓方法的灵敏度。本文提出的技术在比对算法UNI-FOLD中得以实现。当与其他成熟的折叠识别方法进行测试比较时,UNI-FOLD在检测蛋白质序列之间的远程关系时表现出更高的灵敏度和特异性。