Chakrabarty Broto, Parekh Nita
Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
J Bioinform Comput Biol. 2014 Dec;12(6):1442009. doi: 10.1142/S0219720014420098.
Repetition of a structural motif within protein is associated with a wide range of structural and functional roles. In most cases the repeating units are well conserved at the structural level while at the sequence level, they are mostly undetectable suggesting the need for structure-based methods. Since most known methods require a training dataset, de novo approach is desirable. Here, we propose an efficient graph-based approach for detecting structural repeats in proteins. In a protein structure represented as a graph, interactions between inter- and intra-repeat units are well captured by the eigen spectra of adjacency matrix of the graph. These conserved interactions give rise to similar connections and a unique profile of the principal eigen spectra for each repeating unit. The efficacy of the approach is shown on eight repeat families annotated in UniProt, comprising of both solenoid and nonsolenoid repeats with varied secondary structure architecture and repeat lengths. The performance of the approach is also tested on other known benchmark datasets and the performance compared with two repeat identification methods. For a known repeat type, the algorithm also identifies the type of repeat present in the protein. A web tool implementing the algorithm is available at the URL http://bioinf.iiit.ac.in/PRIGSA/.
蛋白质中结构基序的重复与多种结构和功能作用相关。在大多数情况下,重复单元在结构水平上高度保守,而在序列水平上,它们大多难以检测到,这表明需要基于结构的方法。由于大多数已知方法需要一个训练数据集,因此从头开始的方法是可取的。在这里,我们提出了一种基于图的有效方法来检测蛋白质中的结构重复。在表示为图的蛋白质结构中,重复单元之间和内部的相互作用通过图的邻接矩阵的特征谱得到很好的捕捉。这些保守的相互作用产生了相似的连接以及每个重复单元主特征谱的独特轮廓。该方法的有效性在UniProt中注释的八个重复家族上得到了展示,这些家族包括具有不同二级结构架构和重复长度的松螺旋和非松螺旋重复。该方法的性能也在其他已知的基准数据集上进行了测试,并与两种重复识别方法的性能进行了比较。对于已知的重复类型,该算法还能识别蛋白质中存在的重复类型。可通过网址http://bioinf.iiit.ac.in/PRIGSA/获取实现该算法的网络工具。