Zhou Hongyi, Xue Bin, Zhou Yaoqi
Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA.
Protein Sci. 2007 May;16(5):947-55. doi: 10.1110/ps.062597307.
Dividing protein structures into domains is proven useful for more accurate structural and functional characterization of proteins. Here, we develop a method, called DDOMAIN, that divides structure into DOMAINs using a normalized contact-based domain-domain interaction profile. Results of DDOMAIN are compared to AUTHORS annotations (domain definitions are given by the authors who solved protein structures), as well as to popular SCOP and CATH annotations by human experts and automatic programs. DDOMAIN's automatic annotations are most consistent with the AUTHORS annotations (90% agreement in number of domains and 88% agreement in both number of domains and at least 85% overlap in domain assignment of residues) if its three adjustable parameters are trained by the AUTHORS annotations. By comparison, the agreement is 83% (81% with at least 85% overlap criterion) between SCOP-trained DDOMAIN and SCOP annotations and 77% (73%) between CATH-trained DDOMAIN and CATH annotations. The agreement between DDOMAIN and AUTHORS annotations goes beyond single-domain proteins (97%, 82%, and 56% for single-, two-, and three-domain proteins, respectively). For an "easy" data set of proteins whose CATH and SCOP annotations agree with each other in number of domains, the agreement is 90% (89%) between "easy-set"-trained DDOMAIN and CATH/SCOP annotations. The consistency between SCOP-trained DDOMAIN and SCOP annotations is superior to two other recently developed, SCOP-trained, automatic methods PDP (protein domain parser), and DomainParser 2. We also tested a simple consensus method made of PDP, DomainParser 2, and DDOMAIN and a different version of DDOMAIN based on a more sophisticated statistical energy function. The DDOMAIN server and its executable are available in the services section on http://sparks.informatics.iupui.edu.
将蛋白质结构划分为结构域已被证明有助于更准确地对蛋白质进行结构和功能表征。在此,我们开发了一种名为DDOMAIN的方法,该方法使用基于归一化接触的结构域-结构域相互作用图谱将结构划分为结构域。将DDOMAIN的结果与作者注释(结构域定义由解析蛋白质结构的作者给出)以及人类专家和自动程序给出的流行的SCOP和CATH注释进行比较。如果DDOMAIN的三个可调参数通过作者注释进行训练,那么其自动注释与作者注释最为一致(结构域数量的一致性为90%,结构域数量和残基结构域分配中至少85%重叠的一致性为88%)。相比之下,经SCOP训练的DDOMAIN与SCOP注释之间的一致性为83%(采用至少85%重叠标准时为81%),经CATH训练的DDOMAIN与CATH注释之间的一致性为77%(73%)。DDOMAIN与作者注释之间的一致性不仅适用于单结构域蛋白质(单结构域、双结构域和三结构域蛋白质的一致性分别为97%、82%和56%)。对于一个“简单”的蛋白质数据集,其CATH和SCOP注释在结构域数量上相互一致,经“简单数据集”训练的DDOMAIN与CATH/SCOP注释之间的一致性为90%(89%)。经SCOP训练的DDOMAIN与SCOP注释之间的一致性优于另外两种最近开发的、经SCOP训练的自动方法PDP(蛋白质结构域解析器)和DomainParser 2。我们还测试了一种由PDP、DomainParser 2和DDOMAIN组成的简单共识方法,以及基于更复杂统计能量函数的不同版本的DDOMAIN。DDOMAIN服务器及其可执行文件可在http://sparks.informatics.iupui.edu的服务部分获取。