Suppr超能文献

PlasGO:基于遗传结构增强对质粒编码蛋白质的基于基因本体论(GO)的功能预测

PlasGO: enhancing GO-based function prediction for plasmid-encoded proteins based on genetic structure.

作者信息

Ji Yongxin, Shang Jiayu, Guan Jiaojiao, Zou Wei, Liao Herui, Tang Xubo, Sun Yanni

机构信息

Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong SAR (HKG), China.

Department of Information Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong SAR (HKG), China.

出版信息

Gigascience. 2024 Jan 2;13. doi: 10.1093/gigascience/giae104.

Abstract

BACKGROUND

Plasmid, as a mobile genetic element, plays a pivotal role in facilitating the transfer of traits, such as antimicrobial resistance, among the bacterial community. Annotating plasmid-encoded proteins with the widely used Gene Ontology (GO) vocabulary is a fundamental step in various tasks, including plasmid mobility classification. However, GO prediction for plasmid-encoded proteins faces 2 major challenges: the high diversity of functions and the limited availability of high-quality GO annotations.

RESULTS

In this study, we introduce PlasGO, a tool that leverages a hierarchical architecture to predict GO terms for plasmid proteins. PlasGO utilizes a powerful protein language model to learn the local context within protein sentences and a BERT model to capture the global context within plasmid sentences. Additionally, PlasGO allows users to control the precision by incorporating a self-attention confidence weighting mechanism. We rigorously evaluated PlasGO and benchmarked it against 7 state-of-the-art tools in a series of experiments. The experimental results collectively demonstrate that PlasGO has achieved commendable performance. PlasGO significantly expanded the annotations of the plasmid-encoded protein database by assigning high-confidence GO terms to over 95% of previously unannotated proteins, showcasing impressive precision of 0.8229, 0.7941, and 0.8870 for the 3 GO categories, respectively, as measured on the novel protein test set.

CONCLUSIONS

PlasGO, a hierarchical tool incorporating protein language models and BERT, significantly expanded plasmid protein annotations by predicting high-confidence GO terms. These annotations have been compiled into a database, which will serve as a valuable contribution to downstream plasmid analysis and research.

摘要

背景

质粒作为一种可移动的遗传元件,在促进细菌群落中抗菌抗性等性状的转移方面发挥着关键作用。用广泛使用的基因本体论(GO)词汇注释质粒编码的蛋白质是包括质粒移动性分类在内的各种任务的基本步骤。然而,对质粒编码蛋白质进行GO预测面临两大挑战:功能的高度多样性和高质量GO注释的有限可用性。

结果

在本研究中,我们引入了PlasGO,这是一种利用分层架构预测质粒蛋白质GO术语的工具。PlasGO利用强大的蛋白质语言模型来学习蛋白质句子中的局部上下文,并利用BERT模型来捕捉质粒句子中的全局上下文。此外,PlasGO允许用户通过纳入自注意力置信度加权机制来控制精度。我们对PlasGO进行了严格评估,并在一系列实验中将其与7种最先进的工具进行了基准测试。实验结果共同表明PlasGO取得了值得称赞的性能。PlasGO通过为超过95%的先前未注释的蛋白质分配高置信度的GO术语,显著扩展了质粒编码蛋白质数据库的注释,在新蛋白质测试集上测量时,分别展示了三个GO类别的令人印象深刻的精度,即0.8229、0.7941和0.8870。

结论

PlasGO是一种结合蛋白质语言模型和BERT的分层工具,通过预测高置信度的GO术语显著扩展了质粒蛋白质注释。这些注释已被汇编成一个数据库,这将为下游质粒分析和研究做出宝贵贡献。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b3b/11659980/daefc10accdb/giae104fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验