• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过多视图多标签潜在张量重构进行蛋白质功能预测。

Protein function prediction through multi-view multi-label latent tensor reconstruction.

作者信息

Armah-Sekum Robert Ebo, Szedmak Sandor, Rousu Juho

机构信息

Department of Computer Science, Aalto University, Konemiehentie 2, 02150, Espoo, Finland.

出版信息

BMC Bioinformatics. 2024 May 2;25(1):174. doi: 10.1186/s12859-024-05789-4.

DOI:10.1186/s12859-024-05789-4
PMID:38698340
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11067221/
Abstract

BACKGROUND

In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task.

RESULTS

We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology.

IMPLEMENTATION

The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .

摘要

背景

在过去二十年中,高通量测序技术的应用加快了蛋白质发现的步伐。然而,由于严格的实验功能表征在时间和资源上的限制,绝大多数蛋白质的功能仍然未知。因此,人们寻求能够为新的和以前未注释的蛋白质提供准确、快速且大规模功能分配的计算方法。利用描述蛋白质的多种特征之间的潜在关联,可以揭示蛋白质不同作用的功能见解,并提高自动功能预测任务的性能。

结果

我们提出了GO-LTR,这是一种多视图多标签预测模型,它依赖于模型权重的高阶张量近似与非线性激活函数相结合。该模型能够学习代表蛋白质的多个输入视图之间的高阶关系,并预测由蛋白质功能类别组成的高维多标签输出。我们在各种性能指标上证明了我们方法的竞争力。实验表明,GO-LTR学习不同蛋白质特征之间的多项式组合,从而提高了性能。进一步的研究确立了GO-LTR在各种具有挑战性的场景下为蛋白质分配功能的实际潜力:与先前观察到的序列具有非常低的序列相似性、在基因本体中很少观察到且高度特异的术语。

实现

用于训练GO-LTR的代码和数据可在https://github.com/aalto-ics-kepaco/GO-LTR-prediction获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/f75dcf692493/12859_2024_5789_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/a6fbf7cfa072/12859_2024_5789_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/5b9392065774/12859_2024_5789_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/07f471fc3bf8/12859_2024_5789_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/bc1bb4a7e4ff/12859_2024_5789_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/50b3309cf44d/12859_2024_5789_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/f75dcf692493/12859_2024_5789_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/a6fbf7cfa072/12859_2024_5789_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/5b9392065774/12859_2024_5789_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/07f471fc3bf8/12859_2024_5789_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/bc1bb4a7e4ff/12859_2024_5789_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/50b3309cf44d/12859_2024_5789_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/30d4/11067221/f75dcf692493/12859_2024_5789_Fig6_HTML.jpg

相似文献

1
Protein function prediction through multi-view multi-label latent tensor reconstruction.通过多视图多标签潜在张量重构进行蛋白质功能预测。
BMC Bioinformatics. 2024 May 2;25(1):174. doi: 10.1186/s12859-024-05789-4.
2
Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。
Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.
3
Protein function prediction from protein-protein interaction network using gene ontology based neighborhood analysis and physico-chemical features.基于基因本体的邻域分析和物理化学特征,从蛋白质-蛋白质相互作用网络预测蛋白质功能。
J Bioinform Comput Biol. 2018 Dec;16(6):1850025. doi: 10.1142/S0219720018500257. Epub 2018 Sep 19.
4
Assigning protein function from domain-function associations using DomFun.基于域-功能关联来分配蛋白质功能,使用 DomFun。
BMC Bioinformatics. 2022 Jan 15;23(1):43. doi: 10.1186/s12859-022-04565-6.
5
DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.DeepGO:使用深度本体感知分类器从序列和相互作用预测蛋白质功能。
Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.
6
Use of Chou's 5-steps rule to predict the subcellular localization of gram-negative and gram-positive bacterial proteins by multi-label learning based on gene ontology annotation and profile alignment.利用 Chou 的 5 步规则,通过基于基因本体论注释和序列比对的多标签学习,预测革兰氏阴性和革兰氏阳性细菌蛋白质的亚细胞定位。
J Integr Bioinform. 2020 Jun 29;18(1):51-79. doi: 10.1515/jib-2019-0091.
7
Modeling drug combination effects via latent tensor reconstruction.通过潜在张量重构来模拟药物组合效应。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i93-i101. doi: 10.1093/bioinformatics/btab308.
8
Incorporating functional inter-relationships into protein function prediction algorithms.将功能相互关系纳入蛋白质功能预测算法。
BMC Bioinformatics. 2009 May 12;10:142. doi: 10.1186/1471-2105-10-142.
9
mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines.mGOASVM:基于基因本体和支持向量机的多标签蛋白质亚细胞定位。
BMC Bioinformatics. 2012 Nov 6;13:290. doi: 10.1186/1471-2105-13-290.
10
Accurate prediction of multi-label protein subcellular localization through multi-view feature learning with RBRL classifier.通过使用 RBRL 分类器的多视图特征学习实现多标签蛋白质亚细胞定位的准确预测。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab012.

引用本文的文献

1
Bag-of-words is competitive with sum-of-embeddings language-inspired representations on protein inference.词袋模型在蛋白质推理方面与基于语言启发的词嵌入求和表示法具有竞争力。
PLoS One. 2025 Aug 6;20(8):e0325531. doi: 10.1371/journal.pone.0325531. eCollection 2025.
2
Scaling up drug combination surface prediction.扩大药物组合表面预测。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf099.

本文引用的文献

1
NetGO 3.0: Protein Language Model Improves Large-scale Functional Annotations.NetGO 3.0:蛋白质语言模型提高大规模功能注释
Genomics Proteomics Bioinformatics. 2023 Apr;21(2):349-358. doi: 10.1016/j.gpb.2023.04.001. Epub 2023 Apr 17.
2
UniProt: the Universal Protein Knowledgebase in 2023.UniProt:2023 年的通用蛋白质知识库。
Nucleic Acids Res. 2023 Jan 6;51(D1):D523-D531. doi: 10.1093/nar/gkac1052.
3
DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.
DeepGOZero:基于本体论公理的序列和零样本学习改进蛋白质功能预测。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i238-i245. doi: 10.1093/bioinformatics/btac256.
4
Modeling drug combination effects via latent tensor reconstruction.通过潜在张量重构来模拟药物组合效应。
Bioinformatics. 2021 Jul 12;37(Suppl_1):i93-i101. doi: 10.1093/bioinformatics/btab308.
5
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
6
Structure-based protein function prediction using graph convolutional networks.基于结构的蛋白质功能预测使用图卷积网络。
Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.
7
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
8
TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.TALE:基于 Transformer 的蛋白质功能注释与联合序列-标签嵌入。
Bioinformatics. 2021 Sep 29;37(18):2825-2833. doi: 10.1093/bioinformatics/btab198.
9
The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.CAFA 挑战赛报告称,通过实验筛选,提高了数百个基因的蛋白质功能预测和新的功能注释。
Genome Biol. 2019 Nov 19;20(1):244. doi: 10.1186/s13059-019-1835-8.
10
NetGO: improving large-scale protein function prediction with massive network information.NetGO:利用大规模网络信息提高大规模蛋白质功能预测。
Nucleic Acids Res. 2019 Jul 2;47(W1):W379-W387. doi: 10.1093/nar/gkz388.