• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PlasGO:基于遗传结构增强对质粒编码蛋白质的基于基因本体论(GO)的功能预测

PlasGO: enhancing GO-based function prediction for plasmid-encoded proteins based on genetic structure.

作者信息

Ji Yongxin, Shang Jiayu, Guan Jiaojiao, Zou Wei, Liao Herui, Tang Xubo, Sun Yanni

机构信息

Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.

Department of Information Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR (HKG), China.

出版信息

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae104.

DOI:10.1093/gigascience/giae104
PMID:39704702
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11659980/
Abstract

BACKGROUND

Plasmid, as a mobile genetic element, plays a pivotal role in facilitating the transfer of traits, such as antimicrobial resistance, among the bacterial community. Annotating plasmid-encoded proteins with the widely used Gene Ontology (GO) vocabulary is a fundamental step in various tasks, including plasmid mobility classification. However, GO prediction for plasmid-encoded proteins faces 2 major challenges: the high diversity of functions and the limited availability of high-quality GO annotations.

RESULTS

In this study, we introduce PlasGO, a tool that leverages a hierarchical architecture to predict GO terms for plasmid proteins. PlasGO utilizes a powerful protein language model to learn the local context within protein sentences and a BERT model to capture the global context within plasmid sentences. Additionally, PlasGO allows users to control the precision by incorporating a self-attention confidence weighting mechanism. We rigorously evaluated PlasGO and benchmarked it against 7 state-of-the-art tools in a series of experiments. The experimental results collectively demonstrate that PlasGO has achieved commendable performance. PlasGO significantly expanded the annotations of the plasmid-encoded protein database by assigning high-confidence GO terms to over 95% of previously unannotated proteins, showcasing impressive precision of 0.8229, 0.7941, and 0.8870 for the 3 GO categories, respectively, as measured on the novel protein test set.

CONCLUSIONS

PlasGO, a hierarchical tool incorporating protein language models and BERT, significantly expanded plasmid protein annotations by predicting high-confidence GO terms. These annotations have been compiled into a database, which will serve as a valuable contribution to downstream plasmid analysis and research.

摘要

背景

质粒作为一种可移动的遗传元件,在促进细菌群落中抗菌抗性等性状的转移方面发挥着关键作用。用广泛使用的基因本体论(GO)词汇注释质粒编码的蛋白质是包括质粒移动性分类在内的各种任务的基本步骤。然而,对质粒编码蛋白质进行GO预测面临两大挑战:功能的高度多样性和高质量GO注释的有限可用性。

结果

在本研究中,我们引入了PlasGO,这是一种利用分层架构预测质粒蛋白质GO术语的工具。PlasGO利用强大的蛋白质语言模型来学习蛋白质句子中的局部上下文,并利用BERT模型来捕捉质粒句子中的全局上下文。此外,PlasGO允许用户通过纳入自注意力置信度加权机制来控制精度。我们对PlasGO进行了严格评估,并在一系列实验中将其与7种最先进的工具进行了基准测试。实验结果共同表明PlasGO取得了值得称赞的性能。PlasGO通过为超过95%的先前未注释的蛋白质分配高置信度的GO术语,显著扩展了质粒编码蛋白质数据库的注释,在新蛋白质测试集上测量时,分别展示了三个GO类别的令人印象深刻的精度,即0.8229、0.7941和0.8870。

结论

PlasGO是一种结合蛋白质语言模型和BERT的分层工具,通过预测高置信度的GO术语显著扩展了质粒蛋白质注释。这些注释已被汇编成一个数据库,这将为下游质粒分析和研究做出宝贵贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/0ec68b6f78b0/giae104fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/daefc10accdb/giae104fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/4b47155484e1/giae104fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/9031d6d39e81/giae104fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/623d2b4fb5ec/giae104fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/317d5050f6db/giae104fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/94d20e43c499/giae104fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/57b5f729e261/giae104fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/f181a00f4a2d/giae104fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/512ab3a509ea/giae104fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/00f5392507c4/giae104fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/0ec68b6f78b0/giae104fig11.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/daefc10accdb/giae104fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/4b47155484e1/giae104fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/9031d6d39e81/giae104fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/623d2b4fb5ec/giae104fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/317d5050f6db/giae104fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/94d20e43c499/giae104fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/57b5f729e261/giae104fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/f181a00f4a2d/giae104fig8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/512ab3a509ea/giae104fig9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/00f5392507c4/giae104fig10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/0ec68b6f78b0/giae104fig11.jpg

相似文献

1
PlasGO: enhancing GO-based function prediction for plasmid-encoded proteins based on genetic structure.PlasGO:基于遗传结构增强对质粒编码蛋白质的基于基因本体论(GO)的功能预测
Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae104.
2
Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。
Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.
3
Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习
Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.
4
Interspecies gene function prediction using semantic similarity.基于语义相似性的跨物种基因功能预测
BMC Syst Biol. 2016 Dec 23;10(Suppl 4):121. doi: 10.1186/s12918-016-0361-5.
5
Measuring semantic similarities by combining gene ontology annotations and gene co-function networks.通过结合基因本体注释和基因共功能网络来测量语义相似性。
BMC Bioinformatics. 2015 Feb 14;16:44. doi: 10.1186/s12859-015-0474-7.
6
Evaluating Functional Annotations of Enzymes Using the Gene Ontology.使用基因本体论评估酶的功能注释
Methods Mol Biol. 2017;1446:111-132. doi: 10.1007/978-1-4939-3743-1_9.
7
GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts.GOcats:一个将基因本体论分类为用户定义概念子图的工具。
PLoS One. 2020 Jun 11;15(6):e0233311. doi: 10.1371/journal.pone.0233311. eCollection 2020.
8
ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.ProLoc-GO:利用信息丰富的基因本体术语进行基于序列的蛋白质亚细胞定位预测。
BMC Bioinformatics. 2008 Feb 1;9:80. doi: 10.1186/1471-2105-9-80.
9
HybridGO-Loc: mining hybrid features on gene ontology for predicting subcellular localization of multi-location proteins.HybridGO-Loc:在基因本体论上挖掘混合特征以预测多定位蛋白质的亚细胞定位。
PLoS One. 2014 Mar 19;9(3):e89545. doi: 10.1371/journal.pone.0089545. eCollection 2014.
10
Exploiting ontology graph for predicting sparsely annotated gene function.利用本体图预测注释稀疏的基因功能。
Bioinformatics. 2015 Jun 15;31(12):i357-64. doi: 10.1093/bioinformatics/btv260.

引用本文的文献

1
Protein Set Transformer: A protein-based genome language model to power high diversity viromics.蛋白质集变换器:一种为高多样性病毒组学提供支持的基于蛋白质的基因组语言模型。
bioRxiv. 2024 Jul 29:2024.07.26.605391. doi: 10.1101/2024.07.26.605391.

本文引用的文献

1
CodonBERT large language model for mRNA vaccines.基于 CodonBERT 的 mRNA 疫苗大语言模型。
Genome Res. 2024 Aug 20;34(7):1027-1035. doi: 10.1101/gr.278870.123.
2
Sa-TTCA: An SVM-based approach for tumor T-cell antigen classification using features extracted from biological sequencing and natural language processing.Sa-TTCA:一种基于 SVM 的方法,用于使用从生物测序和自然语言处理中提取的特征对肿瘤 T 细胞抗原进行分类。
Comput Biol Med. 2024 May;174:108408. doi: 10.1016/j.compbiomed.2024.108408. Epub 2024 Apr 4.
3
Genomic language model predicts protein co-regulation and function.
基因组语言模型预测蛋白质的共同调控和功能。
Nat Commun. 2024 Apr 3;15(1):2880. doi: 10.1038/s41467-024-46947-9.
4
Phage-plasmids promote recombination and emergence of phages and plasmids.噬菌体-质粒促进噬菌体和质粒的重组和出现。
Nat Commun. 2024 Feb 20;15(1):1545. doi: 10.1038/s41467-024-45757-3.
5
IMG/PR: a database of plasmids from genomes and metagenomes with rich annotations and metadata.IMG/PR:一个带有丰富注释和元数据的质粒基因组和宏基因组数据库。
Nucleic Acids Res. 2024 Jan 5;52(D1):D164-D173. doi: 10.1093/nar/gkad964.
6
Protein remote homology detection and structural alignment using deep learning.使用深度学习进行蛋白质远程同源检测和结构比对。
Nat Biotechnol. 2024 Jun;42(6):975-985. doi: 10.1038/s41587-023-01917-2. Epub 2023 Sep 7.
7
A mathematician's guide to plasmids: an introduction to plasmid biology for modellers.数学家的质粒指南:模型制作者的质粒生物学入门。
Microbiology (Reading). 2023 Jul;169(7). doi: 10.1099/mic.0.001362.
8
PLASMe: a tool to identify PLASMid contigs from short-read assemblies using transformer.PLASMe:一种使用变压器从短读组装中识别 PLASMid 连续体的工具。
Nucleic Acids Res. 2023 Aug 25;51(15):e83. doi: 10.1093/nar/gkad578.
9
Leveraging transformers-based language models in proteome bioinformatics.基于转换器的语言模型在蛋白质组生物信息学中的应用。
Proteomics. 2023 Dec;23(23-24):e2300011. doi: 10.1002/pmic.202300011. Epub 2023 Jun 29.
10
PFresGO: an attention mechanism-based deep-learning approach for protein annotation by integrating gene ontology inter-relationships.PFresGO:一种基于注意力机制的深度学习方法,通过整合基因本体论的相互关系来进行蛋白质注释。
Bioinformatics. 2023 Mar 1;39(3). doi: 10.1093/bioinformatics/btad094.