School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China.
Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
Bioinformatics. 2020 Sep 15;36(18):4729-4738. doi: 10.1093/bioinformatics/btaa573.
There are over 30 sequence-based predictors of the protein-binding residues (PBRs). They use either structure-annotated or disorder-annotated training datasets, potentially creating a dichotomy where the structure-/disorder-specific models may not be able to cross-over to accurately predict the other type. Moreover, the structure-trained predictors were shown to substantially cross-predict PBRs among residues that interact with non-protein partners (nucleic acids and small ligands). We address these issues by performing first-of-its-kind comparative study of a representative collection of disorder- and structure-trained predictors using a comprehensive benchmark set with the structure- and disorder-derived annotations of PBRs (to analyze the cross-over) and the protein-, nucleic acid- and small ligand-binding proteins (to study the cross-predictions).
Three predictors provide accurate results: SCRIBER, ANCHOR and disoRDPbind. Some of the structure-trained methods make accurate predictions on the structure-annotated proteins. Similarly, the disorder-trained predictors predict well on the disorder-annotated proteins. However, the considered predictors generally fail to cross-over, with the exception of SCRIBER. Our study also reveals that virtually all methods substantially cross-predict PBRs, except for SCRIBER for the structure-annotated proteins and disoRDPbind for the disorder-annotated proteins. We formulate a novel hybrid predictor, hybridPBRpred, that combines results produced by disoRDPbind and SCRIBER to accurately predict disorder- and structure-annotated PBRs. HybridPBRpred generates accurate results that cross-over structure- and disorder-annotated proteins and produces relatively low amount of cross-predictions, offering an accurate alternative to predict PBRs.
HybridPBRpred webserver, benchmark dataset and supplementary information are available at http://biomine.cs.vcu.edu/servers/hybridPBRpred/.
Supplementary data are available at Bioinformatics online.
有超过 30 种基于序列的蛋白质结合残基(PBR)预测器。它们使用结构注释或无序注释的训练数据集,这可能导致一种二分法,即结构/无序特异性模型可能无法交叉准确预测另一种类型。此外,研究表明,结构训练的预测器可以在与非蛋白质伴侣(核酸和小分子配体)相互作用的残基之间大量交叉预测 PBR。我们通过对具有结构和无序衍生注释的 PBR(用于分析交叉)以及蛋白质、核酸和小分子结合蛋白(用于研究交叉预测)的综合基准集,对无序和结构训练的代表性预测器进行了首次比较研究,来解决这些问题。
有三种预测器提供了准确的结果:SCRIBER、ANCHOR 和 disoRDPbind。一些结构训练的方法可以对结构注释的蛋白质进行准确预测。同样,无序训练的预测器可以很好地预测无序注释的蛋白质。然而,考虑到的预测器通常无法交叉,除了 SCRIBER。我们的研究还表明,除了 SCRIBER 对结构注释的蛋白质和 disoRDPbind 对无序注释的蛋白质之外,几乎所有方法都可以大量交叉预测 PBR。我们提出了一种新的混合预测器,hybridPBRpred,它结合了 disoRDPbind 和 SCRIBER 的结果,以准确预测无序和结构注释的 PBR。hybridPBRpred 生成准确的结果,可以交叉结构和无序注释的蛋白质,并产生相对较少的交叉预测,为预测 PBR 提供了一种准确的替代方案。
HybridPBRpred 网络服务器、基准数据集和补充信息可在 http://biomine.cs.vcu.edu/servers/hybridPBRpred/ 获得。
补充数据可在 Bioinformatics 在线获得。