• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Res-Dom:使用深度残差网络和双向长短期记忆网络从序列预测蛋白质结构域边界

Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM.

作者信息

Wang Lei, Zhong Haolin, Xue Zhidong, Wang Yan

机构信息

Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China.

School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.

出版信息

Bioinform Adv. 2022 Sep 1;2(1):vbac060. doi: 10.1093/bioadv/vbac060. eCollection 2022.

DOI:10.1093/bioadv/vbac060
PMID:36699417
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9710680/
Abstract

MOTIVATION

Protein domains are the basic units of proteins that can fold, function and evolve independently. Protein domain boundary partition plays an important role in protein structure prediction, understanding their biological functions, annotating their evolutionary mechanisms and protein design. Although there are many methods that have been developed to predict domain boundaries from protein sequence over the past two decades, there is still much room for improvement.

RESULTS

In this article, a novel domain boundary prediction tool called Res-Dom was developed, which is based on a deep residual network, bidirectional long short-term memory (Bi-LSTM) and transfer learning. We used deep residual neural networks to extract higher-order residue-related information. In addition, we also used a pre-trained protein language model called ESM to extract sequence embedded features, which can summarize sequence context information more abundantly. To improve the global representation of these deep residual networks, a Bi-LSTM network was also designed to consider long-range interactions between residues. Res-Dom was then tested on an independent test set including 342 proteins and generated correct single-domain and multi-domain classifications with a Matthew's correlation coefficient of 0.668, which was 17.6% higher than the second-best compared method. For domain boundaries, the normalized domain overlapping score of Res-Dom was 0.849, which was 5% higher than the second-best compared method. Furthermore, Res-Dom required significantly less time than most of the recently developed state-of-the-art domain prediction methods.

AVAILABILITY AND IMPLEMENTATION

All source code, datasets and model are available at http://isyslab.info/Res-Dom/.

摘要

动机

蛋白质结构域是蛋白质的基本单位,能够独立折叠、发挥功能并进化。蛋白质结构域边界划分在蛋白质结构预测、理解其生物学功能、诠释其进化机制以及蛋白质设计中发挥着重要作用。尽管在过去二十年里已经开发了许多从蛋白质序列预测结构域边界的方法,但仍有很大的改进空间。

结果

在本文中,开发了一种名为Res-Dom的新型结构域边界预测工具,它基于深度残差网络、双向长短期记忆网络(Bi-LSTM)和迁移学习。我们使用深度残差神经网络来提取高阶残基相关信息。此外,我们还使用了一种名为ESM的预训练蛋白质语言模型来提取序列嵌入特征,该特征可以更丰富地总结序列上下文信息。为了改进这些深度残差网络的全局表示,还设计了一个Bi-LSTM网络来考虑残基之间的长程相互作用。然后在一个包含342个蛋白质的独立测试集上对Res-Dom进行测试,其生成的正确单结构域和多结构域分类的马修斯相关系数为0.668,比第二优的比较方法高17.6%。对于结构域边界,Res-Dom的归一化结构域重叠分数为0.849,比第二优的比较方法高5%。此外,与大多数最近开发的先进结构域预测方法相比,Res-Dom所需的时间明显更少。

可用性和实现方式

所有源代码、数据集和模型可在http://isyslab.info/Res-Dom/获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/95aee0131725/vbac060f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/b30020f90d9f/vbac060f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/ed241a9ee7de/vbac060f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/ab7bf2f8ac5c/vbac060f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/95aee0131725/vbac060f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/b30020f90d9f/vbac060f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/ed241a9ee7de/vbac060f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/ab7bf2f8ac5c/vbac060f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a1bd/9710680/95aee0131725/vbac060f4.jpg

相似文献

1
Res-Dom: predicting protein domain boundary from sequence using deep residual network and Bi-LSTM.Res-Dom:使用深度残差网络和双向长短期记忆网络从序列预测蛋白质结构域边界
Bioinform Adv. 2022 Sep 1;2(1):vbac060. doi: 10.1093/bioadv/vbac060. eCollection 2022.
2
DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network.DNN-Dom:通过深度神经网络仅从序列预测蛋白质结构域边界。
Bioinformatics. 2019 Dec 15;35(24):5128-5136. doi: 10.1093/bioinformatics/btz464.
3
FUpred: detecting protein domains through deep-learning-based contact map prediction.FUpred:基于深度学习的接触图预测的蛋白质结构域检测。
Bioinformatics. 2020 Jun 1;36(12):3749-3757. doi: 10.1093/bioinformatics/btaa217.
4
Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks.通过将残差二维双向长短期记忆与卷积神经网络相结合,准确预测蛋白质接触图。
Bioinformatics. 2018 Dec 1;34(23):4039-4045. doi: 10.1093/bioinformatics/bty481.
5
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.基于超深度学习模型的蛋白质接触图从头精确预测
PLoS Comput Biol. 2017 Jan 5;13(1):e1005324. doi: 10.1371/journal.pcbi.1005324. eCollection 2017 Jan.
6
DomBpred: Protein Domain Boundary Prediction Based on Domain-Residue Clustering Using Inter-Residue Distance.DomBpred:基于使用残基间距离的结构域-残基聚类的蛋白质结构域边界预测
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):912-922. doi: 10.1109/TCBB.2022.3175905. Epub 2023 Apr 3.
7
De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.基于神经语言模型的双向长短时记忆条件随机场实现临床文本去识别化
AMIA Annu Symp Proc. 2020 Mar 4;2019:857-863. eCollection 2019.
8
Automated Landslide-Risk Prediction Using Web GIS and Machine Learning Models.基于 WebGIS 和机器学习模型的自动化滑坡风险预测
Sensors (Basel). 2021 Jul 5;21(13):4620. doi: 10.3390/s21134620.
9
Improving DNA-Binding Protein Prediction Using Three-Part Sequence-Order Feature Extraction and a Deep Neural Network Algorithm.利用三部分序列顺序特征提取和深度神经网络算法提高 DNA 结合蛋白预测。
J Chem Inf Model. 2023 Feb 13;63(3):1044-1057. doi: 10.1021/acs.jcim.2c00943. Epub 2023 Jan 31.
10
Improving protein disorder prediction by deep bidirectional long short-term memory recurrent neural networks.通过深度双向长短期记忆循环神经网络改进蛋白质无序预测。
Bioinformatics. 2017 Mar 1;33(5):685-692. doi: 10.1093/bioinformatics/btw678.

引用本文的文献

1
DeepNeuropePred: A robust and universal tool to predict cleavage sites from neuropeptide precursors by protein language model.DeepNeuropePred:一种通过蛋白质语言模型从神经肽前体预测切割位点的强大通用工具。
Comput Struct Biotechnol J. 2023 Dec 5;23:309-315. doi: 10.1016/j.csbj.2023.12.004. eCollection 2024 Dec.
2
Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein-protein interactions.预训练蛋白质语言模型为拟南芥蛋白质-蛋白质相互作用的预测带来新曙光。
Plant Methods. 2023 Dec 7;19(1):141. doi: 10.1186/s13007-023-01119-6.

本文引用的文献

1
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
2
A novel end-to-end method to predict RNA secondary structure profile based on bidirectional LSTM and residual neural network.一种基于双向 LSTM 和残差神经网络的新型 RNA 二级结构预测端到端方法。
BMC Bioinformatics. 2021 Mar 31;22(1):169. doi: 10.1186/s12859-021-04102-x.
3
Real-time deep learning-based image recognition for applications in automated positioning and injection of biological cells.
基于深度学习的实时图像识别在生物细胞自动定位与注射中的应用
Comput Biol Med. 2020 Oct;125:103976. doi: 10.1016/j.compbiomed.2020.103976. Epub 2020 Aug 25.
4
FUpred: detecting protein domains through deep-learning-based contact map prediction.FUpred:基于深度学习的接触图预测的蛋白质结构域检测。
Bioinformatics. 2020 Jun 1;36(12):3749-3757. doi: 10.1093/bioinformatics/btaa217.
5
Topology Prediction Improvement of α-helical Transmembrane Proteins Through Helix-tail Modeling and Multiscale Deep Learning Fusion.通过螺旋-尾部建模和多尺度深度学习融合提高 α-螺旋跨膜蛋白的拓扑预测。
J Mol Biol. 2020 Feb 14;432(4):1279-1296. doi: 10.1016/j.jmb.2019.12.007. Epub 2019 Dec 21.
6
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.
7
DeepGOPlus: improved protein function prediction from sequence.DeepGOPlus:从序列中改进蛋白质功能预测。
Bioinformatics. 2020 Jan 15;36(2):422-429. doi: 10.1093/bioinformatics/btz595.
8
DNN-Dom: predicting protein domain boundary from sequence alone by deep neural network.DNN-Dom:通过深度神经网络仅从序列预测蛋白质结构域边界。
Bioinformatics. 2019 Dec 15;35(24):5128-5136. doi: 10.1093/bioinformatics/btz464.
9
ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks.ResPRE:通过结合精度矩阵和深度残差神经网络进行高精度蛋白质接触预测。
Bioinformatics. 2019 Nov 1;35(22):4647-4655. doi: 10.1093/bioinformatics/btz291.
10
DeepDom: Predicting protein domain boundary from sequence alone using stacked bidirectional LSTM.DeepDom:仅使用堆叠双向长短期记忆网络从序列预测蛋白质结构域边界
Pac Symp Biocomput. 2019;24:66-75.