利用结构域排列的比对进行蛋白质的快速相似性搜索。

Rapid similarity search of proteins using alignments of domain arrangements.

机构信息

Westfalian Wilhelms University, Institute of Evolution and Biodiversity, Huefferstr. 1, 48149 Muenster, Germany and Max Planck Institute for Infection Biology, Charitéplatz 1, 10117 Berlin, Germany.

出版信息

Bioinformatics. 2014 Jan 15;30(2):274-81. doi: 10.1093/bioinformatics/btt379. Epub 2013 Jul 4.

DOI:10.1093/bioinformatics/btt379

PMID:23828785

Abstract

MOTIVATION

Homology search methods are dominated by the central paradigm that sequence similarity is a proxy for common ancestry and, by extension, functional similarity. For determining sequence similarity in proteins, most widely used methods use models of sequence evolution and compare amino-acid strings in search for conserved linear stretches. Probabilistic models or sequence profiles capture the position-specific variation in an alignment of homologous sequences and can identify conserved motifs or domains. While profile-based search methods are generally more accurate than simple sequence comparison methods, they tend to be computationally more demanding. In recent years, several methods have emerged that perform protein similarity searches based on domain composition. However, few methods have considered the linear arrangements of domains when conducting similarity searches, despite strong evidence that domain order can harbour considerable functional and evolutionary signal.

RESULTS

Here, we introduce an alignment scheme that uses a classical dynamic programming approach to the global alignment of domains. We illustrate that representing proteins as strings of domains (domain arrangements) and comparing these strings globally allows for a both fast and sensitive homology search. Further, we demonstrate that the presented methods complement existing methods by finding similar proteins missed by popular amino-acid-based comparison methods.

AVAILABILITY

An implementation of the presented algorithms, a web-based interface as well as a command-line program for batch searching against the UniProt database can be found at http://rads.uni-muenster.de. Furthermore, we provide a JAVA API for programmatic access to domain-string–based search methods.

摘要

动机

同源搜索方法主要基于这样一个中心范式，即序列相似性是共同祖先的代理，并且可以扩展到功能相似性。对于确定蛋白质中的序列相似性，最广泛使用的方法使用序列进化模型，并在搜索中比较氨基酸字符串，以寻找保守的线性延伸。概率模型或序列轮廓捕获同源序列比对中的位置特异性变化，并可以识别保守的基序或结构域。虽然基于轮廓的搜索方法通常比简单的序列比较方法更准确，但它们往往在计算上要求更高。近年来，出现了几种基于结构域组成进行蛋白质相似性搜索的方法。然而，当进行相似性搜索时，很少有方法考虑结构域的线性排列，尽管有强有力的证据表明结构域顺序可以包含相当大的功能和进化信号。