Kondra Sarika, Sarkar Titli, Raghavan Vijay, Xu Wu
The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA, United States.
Department of Chemistry, University of Louisiana at Lafayette, Lafayette, LA, United States.
Front Chem. 2021 Jan 13;8:602291. doi: 10.3389/fchem.2020.602291. eCollection 2020.
Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the C atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as "key," A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across proteins. A structure is thereby represented by a vector of integers. Our method is able to accurately quantify similarity of structure or substructure by matching numbers of identical keys between two proteins. The uniqueness of our method includes: (i) a unique way to represent structures to avoid performing structural superimposition; (ii) use of triangles to represent substructures as it is the simplest primitive to capture shape; (iii) complex structure comparison is achieved by matching integers corresponding to multiple TSRs. Every substructure of one protein is compared to every other substructure in a different protein. The method is used in the studies of proteases and kinases because they play essential roles in cell signaling, and a majority of these constitute drug targets. The new motifs or substructures we identified specifically for proteases and kinases provide a deeper insight into their structural relations. Furthermore, the method provides a unique way to study protein conformational changes. In addition, the results from CATH and SCOP data sets clearly demonstrate that our method can distinguish alpha helices from beta pleated sheets and . Our method has the potential to be developed into a powerful tool for efficient structure-BLAST search and comparison, just as BLAST is for sequence search and alignment.
蛋白质三维结构比较方法的发展对于理解蛋白质功能至关重要。与此同时,开发这样一种方法极具挑战性。在过去40年里,自第一种自动化结构方法问世以来,使用不同结构表示法发表了约200篇论文。现有方法可分为五类:基于序列、距离、二级结构、几何和网络的结构比较。每一类都有其独特之处,但也存在局限性。我们开发了一种新方法,利用三角空间关系(TSR)概念对蛋白质的三维结构进行建模,其中以蛋白质的C原子为顶点构建三角形。每个三角形用一个整数表示,我们称之为“键”。键是根据基于规则的公式,利用长度、角度和顶点标签计算得出的,这确保了跨蛋白质的相同TSR能被赋予相同的键。因此,一个结构由一个整数向量表示。我们的方法能够通过匹配两个蛋白质之间相同键的数量来准确量化结构或子结构的相似性。我们方法的独特之处包括:(i)一种独特的表示结构的方式,避免进行结构叠加;(ii)使用三角形表示子结构,因为它是捕捉形状的最简单基元;(iii)通过匹配对应多个TSR的整数来实现复杂的结构比较。一个蛋白质的每个子结构都与另一个蛋白质中的每个其他子结构进行比较。该方法用于蛋白酶和激酶的研究,因为它们在细胞信号传导中起关键作用,并且其中大多数构成药物靶点。我们专门为蛋白酶和激酶鉴定的新基序或子结构为它们的结构关系提供了更深入的见解。此外,该方法为研究蛋白质构象变化提供了一种独特的方式。此外,来自CATH和SCOP数据集的结果清楚地表明,我们的方法能够区分α螺旋和β折叠片层。我们的方法有潜力发展成为一种强大的工具,用于高效的结构BLAST搜索和比较,就像BLAST用于序列搜索和比对一样。