Suppr超能文献

Res-Dom:使用深度残差网络和双向长短期记忆网络从序列预测蛋白质结构域边界

Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM.

作者信息

Wang Lei, Zhong Haolin, Xue Zhidong, Wang Yan

机构信息

Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.

School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.

出版信息

Bioinform Adv. 2022 Sep 1;2(1):vbac060. doi: 10.1093/bioadv/vbac060. eCollection 2022.

Abstract

MOTIVATION

Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement.

RESULTS

In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew's correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods.

AVAILABILITY AND IMPLEMENTATION

All source code, datasets and model are available at http://isyslab.info/Res-Dom/.

摘要

动机

蛋白质结构域是蛋白质的基本单位,能够独立折叠、发挥功能并进化。蛋白质结构域边界划分在蛋白质结构预测、理解其生物学功能、诠释其进化机制以及蛋白质设计中发挥着重要作用。尽管在过去二十年里已经开发了许多从蛋白质序列预测结构域边界的方法,但仍有很大的改进空间。

结果

在本文中,开发了一种名为Res-Dom的新型结构域边界预测工具,它基于深度残差网络、双向长短期记忆网络(Bi-LSTM)和迁移学习。我们使用深度残差神经网络来提取高阶残基相关信息。此外,我们还使用了一种名为ESM的预训练蛋白质语言模型来提取序列嵌入特征,该特征可以更丰富地总结序列上下文信息。为了改进这些深度残差网络的全局表示,还设计了一个Bi-LSTM网络来考虑残基之间的长程相互作用。然后在一个包含342个蛋白质的独立测试集上对Res-Dom进行测试,其生成的正确单结构域和多结构域分类的马修斯相关系数为0.668,比第二优的比较方法高17.6%。对于结构域边界,Res-Dom的归一化结构域重叠分数为0.849,比第二优的比较方法高5%。此外,与大多数最近开发的先进结构域预测方法相比,Res-Dom所需的时间明显更少。

可用性和实现方式

所有源代码、数据集和模型可在http://isyslab.info/Res-Dom/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/b30020f90d9f/vbac060f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验