Suppr超能文献

利用结构域排列的比对进行蛋白质的快速相似性搜索。

Rapid similarity search of proteins using alignments of domain arrangements.

机构信息

Westfalian Wilhelms University, Institute of Evolution and Biodiversity, Huefferstr. 1, 48149 Muenster, Germany and Max Planck Institute for Infection Biology, Charitéplatz 1, 10117 Berlin, Germany.

出版信息

Bioinformatics. 2014 Jan 15;30(2):274-81. doi: 10.1093/bioinformatics/btt379. Epub 2013 Jul 4.

Abstract

MOTIVATION

Homology search methods are dominated by the central paradigm that sequence similarity is a proxy for common ancestry and, by extension, functional similarity. For determining sequence similarity in proteins, most widely used methods use models of sequence evolution and compare amino-acid strings in search for conserved linear stretches. Probabilistic models or sequence profiles capture the position-specific variation in an alignment of homologous sequences and can identify conserved motifs or domains. While profile-based search methods are generally more accurate than simple sequence comparison methods, they tend to be computationally more demanding. In recent years, several methods have emerged that perform protein similarity searches based on domain composition. However, few methods have considered the linear arrangements of domains when conducting similarity searches, despite strong evidence that domain order can harbour considerable functional and evolutionary signal.

RESULTS

Here, we introduce an alignment scheme that uses a classical dynamic programming approach to the global alignment of domains. We illustrate that representing proteins as strings of domains (domain arrangements) and comparing these strings globally allows for a both fast and sensitive homology search. Further, we demonstrate that the presented methods complement existing methods by finding similar proteins missed by popular amino-acid-based comparison methods.

AVAILABILITY

An implementation of the presented algorithms, a web-based interface as well as a command-line program for batch searching against the UniProt database can be found at http://rads.uni-muenster.de. Furthermore, we provide a JAVA API for programmatic access to domain-string–based search methods.

摘要

动机

同源搜索方法主要基于这样一个中心范式,即序列相似性是共同祖先的代理,并且可以扩展到功能相似性。对于确定蛋白质中的序列相似性,最广泛使用的方法使用序列进化模型,并在搜索中比较氨基酸字符串,以寻找保守的线性延伸。概率模型或序列轮廓捕获同源序列比对中的位置特异性变化,并可以识别保守的基序或结构域。虽然基于轮廓的搜索方法通常比简单的序列比较方法更准确,但它们往往在计算上要求更高。近年来,出现了几种基于结构域组成进行蛋白质相似性搜索的方法。然而,当进行相似性搜索时,很少有方法考虑结构域的线性排列,尽管有强有力的证据表明结构域顺序可以包含相当大的功能和进化信号。

结果

在这里,我们引入了一种对齐方案,该方案使用经典的动态规划方法对结构域进行全局对齐。我们表明,将蛋白质表示为结构域的字符串(结构域排列)并对这些字符串进行全局比较,可以实现快速而敏感的同源搜索。此外,我们证明,所提出的方法通过发现流行的基于氨基酸的比较方法错过的相似蛋白质,补充了现有方法。

可用性

所提出算法的实现、基于网络的接口以及针对 UniProt 数据库的批处理搜索命令行程序可在 http://rads.uni-muenster.de 上找到。此外,我们还提供了一个用于基于域字符串搜索方法的编程访问的 JAVA API。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验