Krishnadev O, Rekha N, Pandit S B, Abhiman S, Mohanty S, Swapna L S, Gore S, Srinivasan N
Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India.
Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W126-9. doi: 10.1093/nar/gki474.
PROtein Domain Organization and Comparison (PRODOC) comprises several programs that enable convenient comparison of proteins as a sequence of domains. The in-built dataset currently consists of approximately 698 000 proteins from 192 organisms with complete genomic data, and all the SWISSPROT proteins obtained from the Pfam database. All the entries in PRODOC are represented as a sequence of functional domains, assigned using hidden Markov models, instead of as a sequence of amino acids. On average 69% of the proteins in the proteomes and 49% of the residues are covered by functional domain assignments. Software tools allow the user to query the dataset with a sequence of domains and identify proteins with the same or a jumbled or circularly permuted arrangement of domains. As it is proposed that proteins with jumbled or the same domain sequences have similar functions, this search tool is useful in assigning the overall function of a multi-domain protein. Unique features of PRODOC include the generation of alignments between multi-domain proteins on the basis of the sequence of domains and in-built information on distantly related domain families forming superfamilies. It is also possible using PRODOC to identify domain sharing and gene fusion events across organisms. An exhaustive genome-genome comparison tool in PRODOC also enables the detection of successive domain sharing and domain fusion events across two organisms. The tool permits the identification of gene clusters involved in similar biological processes in two closely related organisms. The URL for PRODOC is http://hodgkin.mbu.iisc.ernet.in/~prodoc.
蛋白质结构域组织与比较(PRODOC)包含多个程序,可方便地将蛋白质作为结构域序列进行比较。内置数据集目前包含来自192个具有完整基因组数据的生物体的约698000种蛋白质,以及从Pfam数据库获得的所有SWISSPROT蛋白质。PRODOC中的所有条目均表示为使用隐马尔可夫模型分配的功能结构域序列,而非氨基酸序列。蛋白质组中平均69%的蛋白质和49%的残基被功能结构域分配所覆盖。软件工具允许用户使用结构域序列查询数据集,并识别具有相同、混乱或环形排列结构域的蛋白质。由于有人提出具有混乱或相同结构域序列的蛋白质具有相似功能,因此该搜索工具在确定多结构域蛋白质的整体功能方面很有用。PRODOC的独特功能包括基于结构域序列生成多结构域蛋白质之间的比对,以及关于形成超家族的远缘相关结构域家族的内置信息。使用PRODOC还可以识别不同生物体之间的结构域共享和基因融合事件。PRODOC中的一个详尽的基因组-基因组比较工具还能够检测两个生物体之间连续的结构域共享和结构域融合事件。该工具允许识别两个密切相关生物体中参与相似生物学过程的基因簇。PRODOC的网址是http://hodgkin.mbu.iisc.ernet.in/~prodoc 。