Department of Biology, University of Padua, Viale G. Colombo 3, 35131 Padova, Italy.
Bioinformatics. 2012 Dec 15;28(24):3257-64. doi: 10.1093/bioinformatics/bts550. Epub 2012 Sep 8.
Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure.
Here we introduce RAPHAEL, a novel method for the detection of solenoids in protein structures. It reliably solves three problems of increasing difficulty: (1) recognition of solenoid domains, (2) determination of their periodicity and (3) assignment of insertions. RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. The resulting method is very accurate, with 89.5% of solenoid proteins and 97.2% of non-solenoid proteins correctly classified. RAPHAEL periodicities have a Spearman correlation coefficient of 0.877 against the manually established ones. A baseline algorithm for insertion detection in identified solenoids has a Q(2) value of 79.8%, suggesting room for further improvement. RAPHAEL finds 1931 highly confident repeat structures not previously annotated as solenoids in the Protein Data Bank records.
重复蛋白质形成了一个独特的结构类别,其中折叠过程大大简化。已经定义了几个类别,其中周期性约为 5 到 40 的螺旋重复是最具挑战性的。这些蛋白质进化迅速,其周期性可能在序列水平上迅速隐藏。从结构的角度来看,找到螺旋可能会因为存在插入或多个结构域而变得复杂。据我们所知,目前还没有自动化的方法可以从结构上描述螺旋重复。
在这里,我们引入了 RAPHAEL,这是一种用于检测蛋白质结构中螺旋的新方法。它可靠地解决了三个越来越困难的问题:(1)识别螺旋结构域,(2)确定其周期性,(3)分配插入。RAPHAEL 使用一种几何方法模拟手动分类,生成几个数字参数,这些参数经过优化,以获得最佳性能。该方法非常准确,89.5%的螺旋蛋白和 97.2%的非螺旋蛋白被正确分类。RAPHAEL 的周期性与手动建立的周期性具有 0.877 的斯皮尔曼相关系数。在已识别的螺旋结构中用于插入检测的基线算法的 Q(2)值为 79.8%,表明还有进一步改进的空间。RAPHAEL 在蛋白质数据库记录中发现了 1931 个以前未注释为螺旋的高度置信度的重复结构。