• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用可变长蛋白质序列的蛋白质功能预测深度稳健框架。

Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1648-1659. doi: 10.1109/TCBB.2019.2911609. Epub 2019 Apr 16.

DOI:10.1109/TCBB.2019.2911609
PMID:30998479
Abstract

The order of amino acids in a protein sequence enables the protein to acquire a conformation suitable for performing functions, thereby motivating the need to analyze these sequences for predicting functions. Although machine learning based approaches are fast compared to methods using BLAST, FASTA, etc., they fail to perform well for long protein sequences (with more than 300 amino acids). In this paper, we introduce a novel method for construction of two separate feature sets for protein using bi-directional long short-term memory network based on the analysis of fixed 1) single-sized segments and 2) multi-sized segments. The model trained on the proposed feature set based on multi-sized segments is combined with the model trained using state-of-the-art Multi-label Linear Discriminant Analysis (MLDA) features to further improve the accuracy. Extensive evaluations using separate datasets for biological processes and molecular functions demonstrate not only improved results for long sequences, but also significantly improve the overall accuracy over state-of-the-art method. The single-sized approach produces an improvement of +3.37 percent for biological processes and +5.48 percent for molecular functions over the MLDA based classifier. The corresponding numbers for multi-sized approach are +5.38 and +8.00 percent. Combining the two models, the accuracy further improves to +7.41 and +9.21 percent, respectively.

摘要

蛋白质序列中氨基酸的顺序使蛋白质能够获得适合执行功能的构象,从而促使人们需要分析这些序列以预测功能。虽然基于机器学习的方法比使用 BLAST、FASTA 等的方法快,但它们在处理长蛋白质序列(超过 300 个氨基酸)时表现不佳。在本文中,我们提出了一种新的方法,使用基于双向长短时记忆网络的固定 1)单一大小段和 2)多大小段分析来构建蛋白质的两个单独特征集。基于多大小段的提出的特征集上训练的模型与使用最先进的多标签线性判别分析 (MLDA) 特征训练的模型相结合,以进一步提高准确性。使用生物过程和分子功能的单独数据集进行的广泛评估不仅证明了长序列的结果得到了改善,而且还显著提高了整体准确性超过最先进的方法。单一大小方法在生物过程中产生了+3.37%的改进,在分子功能方面产生了+5.48%的改进,而基于 MLDA 的分类器的相应数字为+5.38%和+8.00%。将两个模型结合起来,准确性分别进一步提高到+7.41%和+9.21%。

相似文献

1
Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences.使用可变长蛋白质序列的蛋白质功能预测深度稳健框架。
IEEE/ACM Trans Comput Biol Bioinform. 2020 Sep-Oct;17(5):1648-1659. doi: 10.1109/TCBB.2019.2911609. Epub 2019 Apr 16.
2
From Protein Sequence to Protein Function via Multi-Label Linear Discriminant Analysis.通过多标签线性判别分析从蛋白质序列到蛋白质功能
IEEE/ACM Trans Comput Biol Bioinform. 2017 May-Jun;14(3):503-513. doi: 10.1109/TCBB.2016.2591529. Epub 2016 Jul 14.
3
SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition.支持向量机折叠法:一种用于判别式多类别蛋白质折叠和超家族识别的工具。
BMC Bioinformatics. 2007 May 22;8 Suppl 4(Suppl 4):S2. doi: 10.1186/1471-2105-8-S4-S2.
4
Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences.利用蛋白质序列的物理化学性质进行泛素化位点预测的计算方法。
BMC Bioinformatics. 2016 Mar 3;17:116. doi: 10.1186/s12859-016-0959-z.
5
DeepPPSite: A deep learning-based model for analysis and prediction of phosphorylation sites using efficient sequence information.DeepPPSite:一种基于深度学习的模型,用于利用有效的序列信息分析和预测磷酸化位点。
Anal Biochem. 2021 Jan 1;612:113955. doi: 10.1016/j.ab.2020.113955. Epub 2020 Sep 16.
6
Accurate prediction of multi-label protein subcellular localization through multi-view feature learning with RBRL classifier.通过使用 RBRL 分类器的多视图特征学习实现多标签蛋白质亚细胞定位的准确预测。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab012.
7
Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic latent semantic indexing.基于概率潜在语义索引的核转位信号预测核蛋白。
BMC Bioinformatics. 2012;13 Suppl 17(Suppl 17):S13. doi: 10.1186/1471-2105-13-S17-S13. Epub 2012 Dec 13.
8
Effect of tokenization on transformers for biological sequences.词元化对生物序列变压器模型的影响。
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae196.
9
RNA-binding protein recognition based on multi-view deep feature and multi-label learning.基于多视图深度特征和多标签学习的 RNA 结合蛋白识别。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa174.
10
An Ensemble Tf-Idf Based Approach to Protein Function Prediction via Sequence Segmentation.一种基于集成词频-逆文档频率的通过序列分割进行蛋白质功能预测的方法。
IEEE/ACM Trans Comput Biol Bioinform. 2022 Sep-Oct;19(5):2685-2696. doi: 10.1109/TCBB.2021.3093060. Epub 2022 Oct 10.

引用本文的文献

1
A multimodal model for protein function prediction.一种用于蛋白质功能预测的多模态模型。
Sci Rep. 2025 Mar 26;15(1):10465. doi: 10.1038/s41598-025-94612-y.
2
Deep learning program to predict protein functions based on sequence information.基于序列信息预测蛋白质功能的深度学习程序。
MethodsX. 2022 Jan 15;9:101622. doi: 10.1016/j.mex.2022.101622. eCollection 2022.
3
Deep Learning in Protein Structural Modeling and Design.蛋白质结构建模与设计中的深度学习
Patterns (N Y). 2020 Nov 12;1(9):100142. doi: 10.1016/j.patter.2020.100142. eCollection 2020 Dec 11.