School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China.
School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China.
Biomolecules. 2024 Feb 28;14(3):287. doi: 10.3390/biom14030287.
Disordered linkers (DLs) are intrinsically disordered regions that facilitate movement between adjacent functional regions/domains, contributing to many key cellular functions. The recently completed second Critical Assessments of protein Intrinsic Disorder prediction (CAID2) experiment evaluated DL predictions by considering a rather narrow scenario when predicting 40 proteins that are already known to have DLs. We expand this evaluation by using a much larger set of nearly 350 test proteins from CAID2 and by investigating three distinct scenarios: (1) prediction residues in DLs vs. in non-DL regions (typical use of DL predictors); (2) prediction of residues in DLs vs. other disordered residues (to evaluate whether predictors can differentiate residues in DLs from other types of intrinsically disordered residues); and (3) prediction of proteins harboring DLs. We find that several methods provide relatively accurate predictions of DLs in the first scenario. However, only one method, APOD, accurately identifies DLs among other types of disordered residues (scenario 2) and predicts proteins harboring DLs (scenario 3). We also find that APOD's predictive performance is modest, motivating further research into the development of new and more accurate DL predictors. We note that these efforts will benefit from a growing amount of training data and the availability of sophisticated deep network models and emphasize that future methods should provide accurate results across the three scenarios.
无序连接子(DLs)是一种固有无序区域,可促进相邻功能区域/结构域之间的运动,从而对许多关键细胞功能作出贡献。最近完成的第二次蛋白质固有无序预测关键评估(CAID2)实验通过考虑预测 40 种已知具有 DL 的蛋白质时相当狭窄的场景来评估 DL 预测。我们通过使用 CAID2 中近 350 种测试蛋白质的更大数据集,并通过研究三种不同的场景来扩展此评估:(1)在 DL 中预测残基与非 DL 区域(DL 预测器的典型用途);(2)在 DL 中预测残基与其他无序残基(以评估预测器是否可以区分 DL 中的残基与其他类型的固有无序残基);(3)预测含有 DL 的蛋白质。我们发现,在第一种情况下,几种方法可相对准确地预测 DL。然而,只有一种方法 APOD 可以准确地识别其他类型无序残基中的 DL(场景 2),并预测含有 DL 的蛋白质(场景 3)。我们还发现,APOD 的预测性能并不理想,这促使我们进一步研究开发新的、更准确的 DL 预测器。我们注意到,这些努力将受益于不断增加的训练数据量以及复杂的深度网络模型的可用性,并强调未来的方法应在三个场景中都提供准确的结果。