DeepDRP：基于来自 Transformer 增强和蛋白质信息的集成视图深度学习架构预测无规则区域。

DeepDRP: Prediction of intrinsically disordered regions based on integrated view deep learning architecture from transformer-enhanced and protein information.

机构信息

School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China.

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; School of Artificial Intelligence, Jilin University, Changchun 130012, China.

出版信息

Int J Biol Macromol. 2023 Dec 31;253(Pt 6):127390. doi: 10.1016/j.ijbiomac.2023.127390. Epub 2023 Oct 11.

DOI:10.1016/j.ijbiomac.2023.127390

PMID:37827403

Abstract

Intrinsic disorder in proteins, a widely distributed phenomenon in nature, is related to many crucial biological processes and various diseases. Traditional determination methods tend to be costly and labor-intensive, therefore it is desirable to seek an accurate identification method of intrinsically disordered proteins (IDPs). In this paper, we proposed a novel Deep learning model for Intrinsically Disordered Regions in Proteins named DeepDRP. DeepDRP employed an innovative TimeDistributed strategy and Bi-LSTM architecture to predict IDPs and is driven by integrated view features of PSSM, Energy-based encoding, AAindex, and transformer-enhanced embeddings including DR-BERT, OntoProtein, Prot-T5, and ESM-2. The comparison of different feature combinations indicates that the transformer-enhanced features contribute far more than traditional features to predict IDPs and ESM-2 accounts for a larger contribution in the pre-trained fusion vectors. The ablation test verified that the TimeDistributed strategy surely increased the model performance and is an efficient approach to the IDP prediction. Compared with eight state-of-the-art methods on the DISORDER723, S1, and DisProt832 datasets, the Matthews correlation coefficient of DeepDRP significantly outperformed competing methods by 4.90 % to 36.20 %, 11.80 % to 26.33 %, and 4.82 % to 13.55 %. In brief, DeepDRP is a reliable model for IDP prediction and is freely available at https://github.com/ZX-COLA/DeepDRP.

摘要

蛋白质中的内源性无序，这是自然界中广泛存在的一种现象，与许多关键的生物过程和各种疾病都有关联。传统的测定方法往往既昂贵又耗费劳力，因此，人们希望找到一种准确的内源性无序蛋白（IDP）鉴定方法。在本文中，我们提出了一种名为 DeepDRP 的新型深度学习模型，用于预测蛋白质中的内源性无序区域。DeepDRP 采用了创新的时间分布式策略和 Bi-LSTM 架构，由 PSSM、基于能量的编码、AAindex 以及包括 DR-BERT、OntoProtein、Prot-T5 和 ESM-2 在内的基于转换器的增强型嵌入的综合视图特征驱动。不同特征组合的比较表明，与传统特征相比，基于转换器的特征对预测 IDP 更有帮助，而 ESM-2 在预训练融合向量中贡献更大。消融试验验证了时间分布式策略确实提高了模型性能，是一种有效的 IDP 预测方法。与 DISORDER723、S1 和 DisProt832 数据集上的八种最先进的方法相比，DeepDRP 的马修斯相关系数在预测 IDP 方面明显优于竞争方法 4.90%至 36.20%、11.80%至 26.33%和 4.82%至 13.55%。总之，DeepDRP 是一种可靠的 IDP 预测模型，可在 https://github.com/ZX-COLA/DeepDRP 上免费获取。

相似文献

DeepDRP: Prediction of intrinsically disordered regions based on integrated view deep learning architecture from transformer-enhanced and protein information.DeepDRP：基于来自 Transformer 增强和蛋白质信息的集成视图深度学习架构预测无规则区域。

Int J Biol Macromol. 2023 Dec 31;253(Pt 6):127390. doi: 10.1016/j.ijbiomac.2023.127390. Epub 2023 Oct 11.

MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model.MoRF_ESM：基于深度变压器蛋白质语言模型预测无序蛋白质中的分子识别特征片段

J Bioinform Comput Biol. 2024 Apr;22(2):2450006. doi: 10.1142/S0219720024500069. Epub 2024 May 28.

Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method.使用深度卷积神经网络方法识别无序蛋白质中的短无序到有序结合区域。

J Bioinform Comput Biol. 2019 Feb;17(1):1950004. doi: 10.1142/S0219720019500045.

Accurate and Fast Prediction of Intrinsically Disordered Protein by Multiple Protein Language Models and Ensemble Learning.通过多种蛋白质语言模型和集成学习实现对固有无序蛋白质的准确快速预测。

J Chem Inf Model. 2024 Apr 8;64(7):2901-2911. doi: 10.1021/acs.jcim.3c01202. Epub 2023 Oct 26.

RFPR-IDP: reduce the false positive rates for intrinsically disordered protein and region prediction by incorporating both fully ordered proteins and disordered proteins.RFPR-IDP：通过同时纳入完全有序的蛋白质和无序的蛋白质，降低内在无序蛋白质和区域预测的假阳性率。

Brief Bioinform. 2021 Mar 22;22(2):2000-2011. doi: 10.1093/bib/bbaa018.

Phanto-IDP: compact model for precise intrinsically disordered protein backbone generation and enhanced sampling.Phanto-IDP：用于精确生成无序蛋白质骨架和增强采样的紧凑模型。

Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad429.

IDP-LM: Prediction of protein intrinsic disorder and disorder functions based on language models.IDP-LM：基于语言模型的蛋白质固有无序预测和无序功能预测。

PLoS Comput Biol. 2023 Nov 22;19(11):e1011657. doi: 10.1371/journal.pcbi.1011657. eCollection 2023 Nov.

Intrinsically disordered proteins (IDPs) in trypanosomatids.锥虫中的内在无序蛋白质（IDP）

BMC Genomics. 2014 Dec 13;15(1):1100. doi: 10.1186/1471-2164-15-1100.

Prediction of the Disordered Regions of Intrinsically Disordered Proteins Based on the Molecular Functions.基于分子功能预测内在无序蛋白质的无序区域

Protein Pept Lett. 2020;27(4):279-286. doi: 10.2174/0929866526666190226160629.

IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning.IDP-Seq2Seq：基于序列到序列学习的无规卷曲区域鉴定。

Bioinformatics. 2021 Jan 29;36(21):5177-5186. doi: 10.1093/bioinformatics/btaa667.

引用本文的文献

FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion.融合编码器：基于多特征融合的内在无序区域识别

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf362.

Use of AI-methods over MD simulations in the sampling of conformational ensembles in IDPs.在内在无序蛋白质构象集合采样中，人工智能方法相较于分子动力学模拟的应用。

Front Mol Biosci. 2025 Apr 8;12:1542267. doi: 10.3389/fmolb.2025.1542267. eCollection 2025.

IDP-EDL: enhancing intrinsically disordered protein prediction by combining protein language model and ensemble deep learning.IDP-EDL：通过结合蛋白质语言模型和集成深度学习增强内在无序蛋白质预测

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf182.

Application of artificial intelligence and machine learning techniques to the analysis of dynamic protein sequences.人工智能和机器学习技术在动态蛋白质序列分析中的应用。

Proteins. 2024 Oct;92(10):1234-1241. doi: 10.1002/prot.26704. Epub 2024 May 29.

Challenges and limitations in computational prediction of protein misfolding in neurodegenerative diseases.神经退行性疾病中蛋白质错误折叠的计算预测面临的挑战与局限

Front Comput Neurosci. 2024 Jan 5;17:1323182. doi: 10.3389/fncom.2023.1323182. eCollection 2023.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DeepDRP：基于来自 Transformer 增强和蛋白质信息的集成视图深度学习架构预测无规则区域。

DeepDRP: Prediction of intrinsically disordered regions based on integrated view deep learning architecture from transformer-enhanced and protein information.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献