Xie Junxi, Jin Xiaopeng, Wei Hang, Sun SaiSai, Liu Yumeng
College of Big Data and Internet, Shenzhen Technology University, 3002 Lantian Road, Pingshan District, Shenzhen, Guangdong 518118, China.
School of Computer Science and Technology, Xidian University, South Campus: 266 Xinglong Section of Xifeng Road, Xi'an, Shaanxi 710126, North Campus: No. 2 South Taibai Road, Xi'an, Shaanxi 710071, China.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf182.
Identification of intrinsically disordered regions (IDRs) in proteins is essential for understanding fundamental cellular processes. The IDRs can be divided into long disordered regions (LDRs) and short disordered regions (SDRs) according to their lengths. In previous studies, most computational methods ignored the differences between LDRs and SDRs, and therefore failed to capture the different patterns of LDRs and SDRs. In this study, we propose IDP-EDL, an ensemble of three predictors. The component predictors were first built based on pretrained protein language model and applied task-specific fine-tuning for short, long, and generic disordered regions. A meta predictor was then trained to integrate three task-specific predictors into the final predictor. The results of experiments show that task-specific supervised fine-tuning can capture the different features of LDRs and SDRs and IDP-EDL can achieve stable performance on datasets with different ratios of LDRs and SDRs. More importantly, IDP-EDL can reach or even surpass state-of-the-art performance than other existing predictors on independent test sets. IDP-EDL is available at https://github.com/joestarXjx/IDP-EDL.
识别蛋白质中的内在无序区域(IDR)对于理解基本细胞过程至关重要。根据长度,IDR可分为长无序区域(LDR)和短无序区域(SDR)。在先前的研究中,大多数计算方法忽略了LDR和SDR之间的差异,因此未能捕捉到LDR和SDR的不同模式。在本研究中,我们提出了IDP-EDL,这是一个由三个预测器组成的集成模型。首先基于预训练的蛋白质语言模型构建组件预测器,并针对短、长和一般无序区域进行特定任务的微调。然后训练一个元预测器,将三个特定任务的预测器集成到最终预测器中。实验结果表明,特定任务的监督微调可以捕捉LDR和SDR的不同特征,并且IDP-EDL在具有不同LDR和SDR比例的数据集上可以实现稳定的性能。更重要的是,在独立测试集上,IDP-EDL能够达到甚至超过其他现有预测器的最先进性能。可在https://github.com/joestarXjx/IDP-EDL获取IDP-EDL。