Basu Sushmita, Kurgan Lukasz
Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia, USA.
Protein Sci. 2025 Oct;34(10):e70298. doi: 10.1002/pro.70298.
Dozens of impactful methods that predict intrinsically disordered regions (IDRs) in protein sequences that interact with proteins and/or nucleic acids were developed. Their training and assessment rely on the IDR-level binding annotations, while the equivalent structure-trained methods predict more granular annotations of binding amino acids (AA). We compiled a new benchmark dataset that annotates binding AA in IDRs and applied it to complete a first-of-its-kind assessment of predictions of the disordered binding residues. We evaluated a representative collection of 14 methods, used several hundred low-similarity test proteins, and focused on the challenging task of differentiating these binding residues from other disordered AA and considering ligand type-specific predictions (protein-protein vs. protein-nucleic acid interactions). We found that current methods struggle to accurately predict binding IDRs among disordered residues; however, better-than-random tools predict disordered binding residues significantly better than binding IDRs. We identified at least one relatively accurate tool for predicting disordered protein-binding and disordered nucleic acid-binding AA. Analysis of cross-predictions between interactions with protein and nucleic acids revealed that most methods are ligand-type-agnostic. Only two predictors of the nucleic acid-binding IDRs and two predictors of the protein-binding IDRs can be considered as ligand-type-specific. We also discussed several potential future directions that would move this field forward by producing more accurate methods that target the prediction of binding residues, reduce cross-predictions, and cover a broader range of ligand types.
人们开发了数十种有影响力的方法,用于预测与蛋白质和/或核酸相互作用的蛋白质序列中的内在无序区域(IDR)。它们的训练和评估依赖于IDR水平的结合注释,而等效的基于结构训练的方法则预测结合氨基酸(AA)的更精细注释。我们编制了一个新的基准数据集,对IDR中的结合AA进行注释,并将其用于对无序结合残基预测进行首次此类评估。我们评估了14种具有代表性的方法,使用了数百种低相似性测试蛋白,并专注于将这些结合残基与其他无序AA区分开来以及考虑配体类型特异性预测(蛋白质-蛋白质相互作用与蛋白质-核酸相互作用)这一具有挑战性的任务。我们发现,当前的方法难以准确预测无序残基中的结合IDR;然而,比随机方法更好的工具在预测无序结合残基方面明显优于预测结合IDR。我们确定了至少一种相对准确的工具,用于预测无序的蛋白质结合和无序的核酸结合AA。对与蛋白质和核酸相互作用之间的交叉预测分析表明,大多数方法与配体类型无关。只有两种核酸结合IDR的预测器和两种蛋白质结合IDR的预测器可被视为配体类型特异性的。我们还讨论了几个潜在的未来方向,这些方向将通过开发更准确的方法来推动该领域的发展,这些方法旨在预测结合残基、减少交叉预测并涵盖更广泛的配体类型。