Li Chen, Wang Xiao-Feng, Chen Zhen, Zhang Ziding, Song Jiangning
Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, VIC 3800, Australia.
Mol Biosyst. 2015 Feb;11(2):354-60. doi: 10.1039/c4mb00569d. Epub 2014 Dec 1.
The coiled-coil, which consists of two or more α-helices winding around each other, is a ubiquitous and the most frequently observed protein-protein interaction motif in nature. The coiled-coil is known for its straightforward heptad repeat pattern and can be readily recognized based on protein primary sequences, exhibiting a variety of oligomer states and topologies. Due to the stable interaction formed between their α-helices, coiled-coils have been under close scrutiny to design novel protein structures for potential applications in the fields of material science, synthetic biology and medicine. However, their broader application requires an in-depth and systematic analysis of the sequence-to-structure relationship of coiled-coil folding and oligomeric formation. In this article, we propose a new oligomerization state predictor, termed as RFCoil, which exploits the most useful and non-redundant amino acid indices combined with the machine learning algorithm - random forest (RF) - to predict the oligomeric states of coiled-coil regions. Benchmarking experiments show that RFCoil achieves an AUC (area under the ROC curve) of 0.849 on the 10-fold cross-validation test using the training dataset and 0.855 on the independent test using the validation dataset, respectively. Performance comparison results indicate that RFCoil outperforms the four existing predictors LOGICOIL, PrOCoil, SCORER 2.0 and Multicoil2. Furthermore, we extract a number of predominant rules from the trained RF model that underlie the oligomeric formation. We also present two case studies to illustrate the applicability of the extracted rules to the prediction of coiled-coil oligomerization state. The RFCoil web server, source codes and datasets are freely available for academic users at http://protein.cau.edu.cn/RFCoil/.
卷曲螺旋由两个或更多相互缠绕的α螺旋组成,是自然界中普遍存在且最常观察到的蛋白质 - 蛋白质相互作用基序。卷曲螺旋以其简单的七肽重复模式而闻名,并且基于蛋白质一级序列很容易识别,呈现出多种寡聚状态和拓扑结构。由于其α螺旋之间形成稳定的相互作用,卷曲螺旋一直受到密切关注,以便设计新型蛋白质结构,用于材料科学、合成生物学和医学领域的潜在应用。然而,它们的更广泛应用需要对卷曲螺旋折叠和寡聚体形成的序列 - 结构关系进行深入系统的分析。在本文中,我们提出了一种新的寡聚化状态预测器,称为RFCoil,它利用最有用且非冗余的氨基酸指标结合机器学习算法——随机森林(RF)——来预测卷曲螺旋区域的寡聚状态。基准实验表明,RFCoil在使用训练数据集的10倍交叉验证测试中AUC(ROC曲线下面积)为0.849,在使用验证数据集的独立测试中为0.855。性能比较结果表明,RFCoil优于现有的四个预测器LOGICOIL、PrOCoil、SCORER 2.0和Multicoil2。此外,我们从训练好的RF模型中提取了一些构成寡聚体形成基础的主要规则。我们还展示了两个案例研究,以说明提取的规则在预测卷曲螺旋寡聚化状态方面的适用性。RFCoil网络服务器、源代码和数据集可供学术用户在http://protein.cau.edu.cn/RFCoil/免费使用。