School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China.
Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; School of Artificial Intelligence, Jilin University, Changchun 130012, China.
Int J Biol Macromol. 2023 Dec 31;253(Pt 6):127390. doi: 10.1016/j.ijbiomac.2023.127390. Epub 2023 Oct 11.
Intrinsic disorder in proteins, a widely distributed phenomenon in nature, is related to many crucial biological processes and various diseases. Traditional determination methods tend to be costly and labor-intensive, therefore it is desirable to seek an accurate identification method of intrinsically disordered proteins (IDPs). In this paper, we proposed a novel Deep learning model for Intrinsically Disordered Regions in Proteins named DeepDRP. DeepDRP employed an innovative TimeDistributed strategy and Bi-LSTM architecture to predict IDPs and is driven by integrated view features of PSSM, Energy-based encoding, AAindex, and transformer-enhanced embeddings including DR-BERT, OntoProtein, Prot-T5, and ESM-2. The comparison of different feature combinations indicates that the transformer-enhanced features contribute far more than traditional features to predict IDPs and ESM-2 accounts for a larger contribution in the pre-trained fusion vectors. The ablation test verified that the TimeDistributed strategy surely increased the model performance and is an efficient approach to the IDP prediction. Compared with eight state-of-the-art methods on the DISORDER723, S1, and DisProt832 datasets, the Matthews correlation coefficient of DeepDRP significantly outperformed competing methods by 4.90 % to 36.20 %, 11.80 % to 26.33 %, and 4.82 % to 13.55 %. In brief, DeepDRP is a reliable model for IDP prediction and is freely available at https://github.com/ZX-COLA/DeepDRP.
蛋白质中的内源性无序,这是自然界中广泛存在的一种现象,与许多关键的生物过程和各种疾病都有关联。传统的测定方法往往既昂贵又耗费劳力,因此,人们希望找到一种准确的内源性无序蛋白(IDP)鉴定方法。在本文中,我们提出了一种名为 DeepDRP 的新型深度学习模型,用于预测蛋白质中的内源性无序区域。DeepDRP 采用了创新的时间分布式策略和 Bi-LSTM 架构,由 PSSM、基于能量的编码、AAindex 以及包括 DR-BERT、OntoProtein、Prot-T5 和 ESM-2 在内的基于转换器的增强型嵌入的综合视图特征驱动。不同特征组合的比较表明,与传统特征相比,基于转换器的特征对预测 IDP 更有帮助,而 ESM-2 在预训练融合向量中贡献更大。消融试验验证了时间分布式策略确实提高了模型性能,是一种有效的 IDP 预测方法。与 DISORDER723、S1 和 DisProt832 数据集上的八种最先进的方法相比,DeepDRP 的马修斯相关系数在预测 IDP 方面明显优于竞争方法 4.90%至 36.20%、11.80%至 26.33%和 4.82%至 13.55%。总之,DeepDRP 是一种可靠的 IDP 预测模型,可在 https://github.com/ZX-COLA/DeepDRP 上免费获取。