Schultz J, Milpetz F, Bork P, Ponting C P
European Molecular Biology Laboratory, Meyerhofstr.1, 69012 Heidelberg, Germany.
Proc Natl Acad Sci U S A. 1998 May 26;95(11):5857-64. doi: 10.1073/pnas.95.11.5857.
Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.
已经构建了信号蛋白中出现的86个结构域的精确多序列比对,并用于提供一个基于网络的工具(SMART:简单模块化架构研究工具),该工具可快速识别和注释信号结构域序列。大多数信号蛋白具有多结构域特征,已知有相当多种结构域组合。与已建立的数据库比较表明,我们的结构域集合中有25%无法从SwissProt中推导出来,41%无法由Pfam注释。SMART能够确定单序列或基因组的模块化结构;应用于整个酵母基因组表明,其至少6.7%的基因含有一个或多个信号结构域,比之前注释的大约多350个。构建SMART的过程预测了:(i)在意外位置的新型结构域同源物,如粘着斑激酶中的带4.1同源结构域;(ii)以前未知的结构域家族,包括一个citron同源结构域;(iii)在鉴定出更多家族成员后结构域家族的推定功能,例如泛素相关结构域(UBA)的泛素结合作用;(iv)蛋白质的细胞作用,如预测的netrin受体中的死亡结构域进一步表明这些分子参与轴突导向;(v)已知疾病基因中的信号结构域,如marenostrin/pyrin和Midline 1中的SPRY结构域;(vi)意外系统发育背景中的结构域,如酵母和细菌中的二酰基甘油激酶同源物;以及(vii)可能的蛋白质错误分类,如白色念珠菌蛋白质中预测的普列克底物蛋白同源结构域为例,该蛋白先前被描述为整联蛋白。