Sikder Abdur R, Zomaya Albert Y
International Computer Science Institute, University of California, Berkeley, CA 94704 USA.
IEEE Trans Nanobioscience. 2008 Sep;7(3):200-5. doi: 10.1109/TNB.2008.2002283.
Wetlaufer introduced the classification of domains into continuous and discontinuous. Continuous domains form from a single-chain segment and discontinuous domains are composed of two or more chain segments. Richardson identified approximately 100 domains in her review. Her assignment was based on the concepts that the domain would be independently stable and/or could undergo rigid-body-like movements with respect to the entire protein. There are now several instances where structurally similar domains occur in different proteins in the absence of noticeable sequence similarity. Possibly, the most notable of such domains is the trios-phosphate isomerase (TIM) barrel. With the increase in the number of known sequences, computer algorithms are required to identify the discontinuous domain of an unknown protein chain in order to determine its structure and function. We have developed a novel algorithm for discontinuous-domain boundary prediction based on a machine learning algorithm and interresidue contact interactions values. We have used 415 proteins, including 100 discontinuous-domain chains for training. There is no method available that is designed solely on a sequence based for the prediction of discontinuous domain. DomainDiscovery performed significantly well compared to the structure-based methods like structural classification of proteins (SCOP), class, architecture, topology and homologous superfamily (CATH), and DOMain MAKer (DOMAK).
韦特劳弗提出将结构域分为连续型和间断型。连续结构域由单链片段构成,间断结构域则由两个或更多链段组成。理查森在她的综述中识别出了大约100个结构域。她的分类依据是这样的概念:结构域将是独立稳定的,并且/或者相对于整个蛋白质能够进行类似刚体的运动。现在有几个实例,即在不同蛋白质中出现了结构相似的结构域,却没有明显的序列相似性。可能,这类结构域中最著名的就是磷酸丙糖异构酶(TIM)桶状结构。随着已知序列数量的增加,需要计算机算法来识别未知蛋白质链的间断结构域,以便确定其结构和功能。我们基于机器学习算法和残基间接触相互作用值,开发了一种用于间断结构域边界预测的新算法。我们使用了415种蛋白质,包括100条间断结构域链用于训练。目前没有仅基于序列设计的用于预测间断结构域的方法。与基于结构的方法如蛋白质结构分类(SCOP)、类别、结构、拓扑和同源超家族(CATH)以及结构域标记(DOMAK)相比,DomainDiscovery的表现显著良好。