Zhang Fuhao, Kurgan Lukasz
College of Information Engineering, Northwest A & F University, China.
Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA.
Comput Struct Biotechnol J. 2024 Dec 17;27:78-88. doi: 10.1016/j.csbj.2024.12.009. eCollection 2025.
A large portion of the Intrinsically Disordered Regions (IDRs) in protein sequences interact with proteins, nucleic acids, and other types of ligands. Correspondingly, dozens of sequence-based predictors of binding IDRs were developed. A recently completed second community-based Critical Assessments of protein Intrinsic Disorder prediction (CAID2) evaluated 32 predictors of binding IDRs. However, CAID2 considered a rather narrow scenario by testing on 78 proteins with binding IDRs and not differentiating between different ligands, in spite that virtually all predictors target IDRs that interact with specific types of ligands. In that scenario, several intrinsic disorder predictors predict binding IDRs with accuracy equivalent to the best predictors of binding IDRs since large majority of IDRs in the 78 test proteins are binding. We substantially extended the CAID2's evaluation by using the entire CAID2 dataset of 348 proteins and considering several arguably more practical scenarios. We assessed whether predictors accurately differentiate binding IDRs from other types of IDRs and how they perform when predicting IDRs that interact with different ligand types. We found that intrinsic disorder predictors cannot accurately identify binding IDRs among other disordered regions, majority of the predictors of binding IDRs are ligand type agnostic (i.e., they cross predict binding in IDRs that interact with ligands that they do not cover), and only a handful of predictors of binding IDRs perform relatively well and generate reasonably low amounts of cross predictions. We also suggest a number of future research directions that would move this active field of research forward.
蛋白质序列中很大一部分内在无序区域(IDR)会与蛋白质、核酸及其他类型的配体相互作用。相应地,人们开发了几十种基于序列的结合IDR预测工具。最近完成的第二次基于社区的蛋白质内在无序预测关键评估(CAID2)对32种结合IDR预测工具进行了评估。然而,CAID2考虑的情况相当有限,它仅对78个具有结合IDR的蛋白质进行测试,且未区分不同的配体,尽管几乎所有预测工具针对的都是与特定类型配体相互作用的IDR。在这种情况下,一些内在无序预测工具预测结合IDR的准确性与最佳结合IDR预测工具相当,因为78个测试蛋白质中的大多数IDR都是具有结合能力的。我们通过使用包含348个蛋白质的整个CAID2数据集并考虑几种更具实际意义的情况,对CAID2的评估进行了大幅扩展。我们评估了预测工具能否准确区分结合IDR与其他类型的IDR,以及它们在预测与不同配体类型相互作用的IDR时的表现。我们发现,内在无序预测工具无法在其他无序区域中准确识别结合IDR,大多数结合IDR预测工具对配体类型不敏感(即它们会交叉预测与未涵盖配体相互作用的IDR中的结合情况),只有少数结合IDR预测工具表现相对较好,产生的交叉预测数量相对较少。我们还提出了一些未来的研究方向,以推动这个活跃的研究领域向前发展。