Han Kun-Sop, Kim Ha-Kyong, Kim Myong-Hyok, Pak Myong-Hyon, Pak Song-Jin, Choe Mun-Myong, Kim Chol-Song
University of Sciences, Pyongyang, Democratic People's Republic of Korea.
Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea.
Int J Biol Macromol. 2025 May;306(Pt 4):141801. doi: 10.1016/j.ijbiomac.2025.141801. Epub 2025 Mar 5.
Intrinsically disordered proteins (IDPs) or regions (IDRs) are widespread in proteomes, and involved in several important biological processes and implicated in many diseases. Many computational methods for IDR prediction are being developed to decrease the gap between the low speed of experimental determination of annotated proteins and the rapid increase of non-annotated proteins, and their performances are blindly tested by the community-driven experiment, the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we developed PredIDR2 series, an updated version of PredIDR tested in CAID2 in order to accurately predict intrinsically disordered regions from protein sequence. It includes four methods depending on the input features and the producing mode of the negative samples of the training set. PredIDR2 series (AUC_ROC = 0.952) perform remarkably better than our previous PredIDR (AUC_ROC = 0.933) for Disorder-PDB dataset of CAID2, which seems to be mainly attributed to the introduction of a new deep convolutional neural network and the augmentation of the training data, especially from DisProt database. PredIDR2 series outperform the state-of-the-art IDR prediction methods participated in CAID2 in terms of AUC_ROC, AUC_PR and DC_mae and belong to the seven top-performing methods in terms of MCC. PredIDR2 series can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.
内在无序蛋白质(IDP)或区域(IDR)在蛋白质组中广泛存在,参与多种重要生物过程,并与许多疾病相关。目前正在开发许多用于IDR预测的计算方法,以缩小已注释蛋白质实验测定速度较慢与未注释蛋白质快速增加之间的差距,并且其性能通过社区驱动的实验——蛋白质内在无序关键评估(CAID)进行盲目测试。在本文中,我们开发了PredIDR2系列,这是在CAID2中测试的PredIDR的更新版本,以便从蛋白质序列中准确预测内在无序区域。它包括四种方法,具体取决于输入特征和训练集负样本的生成模式。对于CAID2的Disorder-PDB数据集,PredIDR2系列(AUC_ROC = 0.952)的表现明显优于我们之前的PredIDR(AUC_ROC = 0.933),这似乎主要归因于新的深度卷积神经网络的引入和训练数据的增加,特别是来自DisProt数据库的数据。在AUC_ROC、AUC_PR和DC_mae方面,PredIDR2系列优于参与CAID2的最先进的IDR预测方法,在MCC方面属于表现最佳的七种方法之一。可以通过https://caid.idpcentral.org/portal上的CAID预测门户免费使用PredIDR2系列,也可以从https://biocomputingup.it/shared/caid-predictors/下载为Singularity容器。