Center for Applied Mathematics, Tianjin University, Tianjin 300072, China.
School of Statistics and Data Science, Nankai University, Tianjin 300074, China.
Bioinformatics. 2020 Dec 30;36(Suppl_2):i754-i761. doi: 10.1093/bioinformatics/btaa808.
Disordered flexible linkers (DFLs) are abundant and functionally important intrinsically disordered regions that connect protein domains and structural elements within domains and which facilitate disorder-based allosteric regulation. Although computational estimates suggest that thousands of proteins have DFLs, they were annotated experimentally in <200 proteins. This substantial annotation gap can be reduced with the help of accurate computational predictors. The sole predictor of DFLs, DFLpred, trade-off accuracy for shorter runtime by excluding relevant but computationally costly predictive inputs. Moreover, it relies on the local/window-based information while lacking to consider useful protein-level characteristics.
We conceptualize, design and test APOD (Accurate Predictor Of DFLs), the first highly accurate predictor that utilizes both local- and protein-level inputs that quantify propensity for disorder, sequence composition, sequence conservation and selected putative structural properties. Consequently, APOD offers significantly more accurate predictions when compared with its faster predecessor, DFLpred, and several other alternative ways to predict DFLs. These improvements stem from the use of a more comprehensive set of inputs that cover the protein-level information and the application of a more sophisticated predictive model, a well-parametrized support vector machine. APOD achieves area under the curve = 0.82 (28% improvement over DFLpred) and Matthews correlation coefficient = 0.42 (180% increase over DFLpred) when tested on an independent/low-similarity test dataset. Consequently, APOD is a suitable choice for accurate and small-scale prediction of DFLs.
无序柔性连接子(DFLs)是丰富且功能重要的内在无序区域,它们连接着蛋白质结构域和域内结构元件,并促进基于无序的变构调节。尽管计算预测表明数千种蛋白质具有 DFLs,但实际上仅在<200 种蛋白质中进行了实验注释。借助准确的计算预测器,可以减少这种大量的注释差距。唯一的 DFLs 预测器 DFLpred 通过排除相关但计算成本高的预测输入,在准确性和较短的运行时间之间进行权衡。此外,它依赖于局部/窗口信息,而缺乏考虑有用的蛋白质水平特征。
我们设计并测试了 APOD(DFLs 的准确预测器),这是第一个利用无序倾向、序列组成、序列保守性和选定的假定结构特性等局部和蛋白质水平输入的高度准确的预测器。因此,与更快的前身 DFLpred 以及其他几种替代的 DFLs 预测方法相比,APOD 提供了更准确的预测。这些改进源于使用更全面的输入集来覆盖蛋白质水平信息,并应用更复杂的预测模型,即经过良好参数化的支持向量机。当在独立的低相似度测试数据集上进行测试时,APOD 的曲线下面积为 0.82(比 DFLpred 提高了 28%),马修斯相关系数为 0.42(比 DFLpred 提高了 180%)。因此,APOD 是准确和小规模预测 DFLs 的合适选择。