Suppr超能文献

PredIDR2:通过更新深度卷积神经网络和补充DisProt数据提高蛋白质内在无序预测的准确性。

PredIDR2: Improving accuracy of protein intrinsic disorder prediction by updating deep convolutional neural network and supplementing DisProt data.

作者信息

Han Kun-Sop, Kim Ha-Kyong, Kim Myong-Hyok, Pak Myong-Hyon, Pak Song-Jin, Choe Mun-Myong, Kim Chol-Song

机构信息

University of Sciences, Pyongyang, Democratic People's Republic of Korea.

Branch of Biotechnology, State Academy of Sciences, Pyongyang, Democratic People's Republic of Korea.

出版信息

Int J Biol Macromol. 2025 May;306(Pt 4):141801. doi: 10.1016/j.ijbiomac.2025.141801. Epub 2025 Mar 5.

Abstract

Intrinsically disordered proteins (IDPs) or regions (IDRs) are widespread in proteomes, and involved in several important biological processes and implicated in many diseases. Many computational methods for IDR prediction are being developed to decrease the gap between the low speed of experimental determination of annotated proteins and the rapid increase of non-annotated proteins, and their performances are blindly tested by the community-driven experiment, the Critical Assessment of protein Intrinsic Disorder (CAID). In this paper, we developed PredIDR2 series, an updated version of PredIDR tested in CAID2 in order to accurately predict intrinsically disordered regions from protein sequence. It includes four methods depending on the input features and the producing mode of the negative samples of the training set. PredIDR2 series (AUC_ROC = 0.952) perform remarkably better than our previous PredIDR (AUC_ROC = 0.933) for Disorder-PDB dataset of CAID2, which seems to be mainly attributed to the introduction of a new deep convolutional neural network and the augmentation of the training data, especially from DisProt database. PredIDR2 series outperform the state-of-the-art IDR prediction methods participated in CAID2 in terms of AUC_ROC, AUC_PR and DC_mae and belong to the seven top-performing methods in terms of MCC. PredIDR2 series can be freely used through the CAID Prediction Portal available at https://caid.idpcentral.org/portal or downloaded as a Singularity container from https://biocomputingup.it/shared/caid-predictors/.

摘要

内在无序蛋白质(IDP)或区域(IDR)在蛋白质组中广泛存在,参与多种重要生物过程,并与许多疾病相关。目前正在开发许多用于IDR预测的计算方法,以缩小已注释蛋白质实验测定速度较慢与未注释蛋白质快速增加之间的差距,并且其性能通过社区驱动的实验——蛋白质内在无序关键评估(CAID)进行盲目测试。在本文中,我们开发了PredIDR2系列,这是在CAID2中测试的PredIDR的更新版本,以便从蛋白质序列中准确预测内在无序区域。它包括四种方法,具体取决于输入特征和训练集负样本的生成模式。对于CAID2的Disorder-PDB数据集,PredIDR2系列(AUC_ROC = 0.952)的表现明显优于我们之前的PredIDR(AUC_ROC = 0.933),这似乎主要归因于新的深度卷积神经网络的引入和训练数据的增加,特别是来自DisProt数据库的数据。在AUC_ROC、AUC_PR和DC_mae方面,PredIDR2系列优于参与CAID2的最先进的IDR预测方法,在MCC方面属于表现最佳的七种方法之一。可以通过https://caid.idpcentral.org/portal上的CAID预测门户免费使用PredIDR2系列,也可以从https://biocomputingup.it/shared/caid-predictors/下载为Singularity容器。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验