通过迭代数据库搜索获取有关蛋白质的重要结构、功能和进化信息。

Gleaning non-trivial structural, functional and evolutionary information about proteins by iterative database searches.

作者信息

Aravind L, Koonin E V

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

J Mol Biol. 1999 Apr 16;287(5):1023-40. doi: 10.1006/jmbi.1999.2653.

DOI:10.1006/jmbi.1999.2653

PMID:10222208

Abstract

Using a number of diverse protein families as test cases, we investigate the ability of the recently developed iterative sequence database search method, PSI-BLAST, to identify subtle relationships between proteins that originally have been deemed detectable only at the level of structure-structure comparison. We show that PSI-BLAST can detect many, though not all, of such relationships, but the success critically depends on the optimal choice of the query sequence used to initiate the search. Generally, there is a correlation between the diversity of the sequences detected in the first pass of database screening and the ability of a given query to detect subtle relationships in subsequent iterations. Accordingly, a thorough analysis of protein superfamilies at the sequence level is necessary in order to maximize the chances of gleaning non-trivial structural and functional inferences, as opposed to a single search, initiated, for example, with the sequence of a protein whose structure is available. This strategy is illustrated by several findings, each of which involves an unexpected structural prediction: (i) a number of previously undetected proteins with the HSP70-actin fold are identified, including a highly conserved and nearly ubiquitous family of metal-dependent proteases (typified by bacterial O-sialoglycoprotease) that represent an adaptation of this fold to a new type of enzymatic activity; (ii) we show that, contrary to the previous conclusions, ATP-dependent and NAD-dependent DNA ligases are confidently predicted to possess the same fold; (iii) the C-terminal domain of 3-phosphoglycerate dehydrogenase, which binds serine and is involved in allosteric regulation of the enzyme activity, is shown to typify a new superfamily of ligand-binding, regulatory domains found primarily in enzymes and regulators of amino acid and purine metabolism; (iv) the immunoglobulin-like DNA-binding domain previously identified in the structures of transcription factors NFkappaB and NFAT is shown to be a member of a distinct superfamily of intracellular and extracellular domains with the immunoglobulin fold; and (v) the Rag-2 subunit of the V-D-J recombinase is shown to contain a kelch-type beta-propeller domain which rules out its evolutionary relationship with bacterial transposases.

摘要

我们以多个不同的蛋白质家族作为测试案例，研究了最近开发的迭代序列数据库搜索方法PSI-BLAST识别蛋白质之间微妙关系的能力，这些蛋白质之间的关系最初被认为只有在结构-结构比较层面才能检测到。我们发现，PSI-BLAST能够检测到许多（尽管不是全部）此类关系，但成功与否关键取决于用于启动搜索的查询序列的最佳选择。一般来说，在数据库筛选的第一轮中检测到的序列多样性与给定查询在后续迭代中检测微妙关系的能力之间存在相关性。因此，有必要在序列水平上对蛋白质超家族进行全面分析，以最大限度地增加获取重要结构和功能推断的机会，这与例如以已知结构的蛋白质序列启动的单次搜索相反。几个发现说明了这一策略，每个发现都涉及一个意想不到的结构预测：（i）鉴定出了一些以前未检测到的具有HSP70-肌动蛋白折叠的蛋白质，包括一个高度保守且几乎普遍存在的金属依赖性蛋白酶家族（以细菌O-唾液酸糖蛋白酶为代表），该家族代表了这种折叠对新型酶活性的适应性；（ii）我们发现，与先前的结论相反，可以确定依赖ATP和依赖NAD的DNA连接酶具有相同的折叠；（iii）3-磷酸甘油酸脱氢酶的C末端结构域结合丝氨酸并参与酶活性的变构调节，它代表了一个主要存在于氨基酸和嘌呤代谢的酶和调节因子中的新的配体结合、调节结构域超家族；（iv）先前在转录因子NFkappaB和NFAT结构中鉴定出的免疫球蛋白样DNA结合结构域被证明是具有免疫球蛋白折叠的细胞内和细胞外结构域的一个独特超家族的成员；（v）V-D-J重组酶的Rag-2亚基被证明含有一个kelch型β-螺旋桨结构域，这排除了它与细菌转座酶的进化关系。