Kuang Rui, Weston Jason, Noble William Stafford, Leslie Christina
Department of Computer Science, Columbia University New York, NY 10027, USA.
Bioinformatics. 2005 Oct 1;21(19):3711-8. doi: 10.1093/bioinformatics/bti608. Epub 2005 Aug 2.
Sequence similarity often suggests evolutionary relationships between protein sequences that can be important for inferring similarity of structure or function. The most widely-used pairwise sequence comparison algorithms for homology detection, such as BLAST and PSI-BLAST, often fail to detect less conserved remotely-related targets.
In this paper, we propose a new general graph-based propagation algorithm called MotifProp to detect more subtle similarity relationships than pairwise comparison methods. MotifProp is based on a protein-motif network, in which edges connect proteins and the k-mer based motif features that they contain. We show that our new motif-based propagation algorithm can improve the ranking results over a base algorithm, such as PSI-BLAST, that is used to initialize the ranking. Despite the complex structure of the protein-motif network, MotifProp can be easily interpreted using the top-ranked motifs and motif-rich regions induced by the propagation, both of which are helpful for discovering conserved structural components in remote homologies.
序列相似性常常暗示蛋白质序列之间的进化关系,这对于推断结构或功能的相似性可能很重要。用于同源性检测的最广泛使用的成对序列比较算法,如BLAST和PSI-BLAST,常常无法检测到保守性较低的远缘相关目标。
在本文中,我们提出了一种新的基于通用图的传播算法,称为MotifProp,以检测比成对比较方法更微妙的相似性关系。MotifProp基于蛋白质基序网络,其中边连接蛋白质及其包含的基于k-mer的基序特征。我们表明,我们新的基于基序的传播算法可以在用于初始化排名的基础算法(如PSI-BLAST)上改进排名结果。尽管蛋白质基序网络结构复杂,但MotifProp可以使用传播诱导的排名靠前的基序和富含基序的区域轻松解释,这两者都有助于发现远缘同源性中保守的结构成分。