Northey Thomas C, Barešić Anja, Martin Andrew C R
Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK.
Bioinformatics. 2018 Jan 15;34(2):223-229. doi: 10.1093/bioinformatics/btx585.
Protein-protein interactions are vital for protein function with the average protein having between three and ten interacting partners. Knowledge of precise protein-protein interfaces comes from crystal structures deposited in the Protein Data Bank (PDB), but only 50% of structures in the PDB are complexes. There is therefore a need to predict protein-protein interfaces in silico and various methods for this purpose. Here we explore the use of a predictor based on structural features and which exploits random forest machine learning, comparing its performance with a number of popular established methods.
On an independent test set of obligate and transient complexes, our IntPred predictor performs well (MCC = 0.370, ACC = 0.811, SPEC = 0.916, SENS = 0.411) and compares favourably with other methods. Overall, IntPred ranks second of six methods tested with SPPIDER having slightly better overall performance (MCC = 0.410, ACC = 0.759, SPEC = 0.783, SENS = 0.676), but considerably worse specificity than IntPred. As with SPPIDER, using an independent test set of obligate complexes enhanced performance (MCC = 0.381) while performance is somewhat reduced on a dataset of transient complexes (MCC = 0.303). The trade-off between sensitivity and specificity compared with SPPIDER suggests that the choice of the appropriate tool is application-dependent.
IntPred is implemented in Perl and may be downloaded for local use or run via a web server at www.bioinf.org.uk/intpred/.
Supplementary data are available at Bioinformatics online.
蛋白质 - 蛋白质相互作用对于蛋白质功能至关重要,平均每个蛋白质有三到十个相互作用伙伴。精确的蛋白质 - 蛋白质界面信息来自于蛋白质数据库(PDB)中 deposited 的晶体结构,但PDB中只有50%的结构是复合物。因此,需要在计算机上预测蛋白质 - 蛋白质界面,并为此开发了各种方法。在这里,我们探索使用一种基于结构特征并利用随机森林机器学习的预测器,并将其性能与一些流行的既定方法进行比较。
在一个由专性和瞬时复合物组成的独立测试集上,我们的IntPred预测器表现良好(MCC = 0.370,ACC = 0.811,SPEC = 0.916,SENS = 0.411),与其他方法相比具有优势。总体而言,IntPred在六种测试方法中排名第二,SPPIDER的整体性能略好(MCC = 0.410,ACC = 0.759,SPEC = 0.783,SENS = 0.676),但其特异性比IntPred差得多。与SPPIDER一样,使用专性复合物的独立测试集可提高性能(MCC = 0.381),而在瞬时复合物数据集上性能会有所降低(MCC = 0.303)。与SPPIDER相比,敏感性和特异性之间的权衡表明,合适工具的选择取决于应用。
IntPred用Perl实现,可以下载供本地使用,也可以通过网络服务器(www.bioinf.org.uk/intpred/)运行。
补充数据可在《生物信息学》在线获取。