• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用 FunGeneTyper 实现微生物蛋白编码基因功能的高精度分类和发现:一个可扩展的深度学习框架。

Highly accurate classification and discovery of microbial protein-coding gene functions using FunGeneTyper: an extensible deep learning framework.

机构信息

College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China.

Key Laboratory of Coastal Environment and Resources of Zhejiang Province, School of Engineering, Westlake University, Hangzhou, Zhejiang 310030, China.

出版信息

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae319.

DOI:10.1093/bib/bbae319
PMID:39007592
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11247404/
Abstract

High-throughput DNA sequencing technologies decode tremendous amounts of microbial protein-coding gene sequences. However, accurately assigning protein functions to novel gene sequences remain a challenge. To this end, we developed FunGeneTyper, an extensible framework with two new deep learning models (i.e., FunTrans and FunRep), structured databases, and supporting resources for achieving highly accurate (Accuracy > 0.99, F1-score > 0.97) and fine-grained classification of antibiotic resistance genes (ARGs) and virulence factor genes. Using an experimentally confirmed dataset of ARGs comprising remote homologous sequences as the test set, our framework achieves by-far-the-best performance in the discovery of new ARGs from human gut (F1-score: 0.6948), wastewater (0.6072), and soil (0.5445) microbiomes, beating the state-of-the-art bioinformatics tools and sequence alignment-based (F1-score: 0.0556-0.5065) and domain-based (F1-score: 0.2630-0.5224) annotation approaches. Furthermore, our framework is implemented as a lightweight, privacy-preserving, and plug-and-play neural network module, facilitating its versatility and accessibility to developers and users worldwide. We anticipate widespread utilization of FunGeneTyper (https://github.com/emblab-westlake/FunGeneTyper) for precise classification of protein-coding gene functions and the discovery of numerous valuable enzymes. This advancement will have a significant impact on various fields, including microbiome research, biotechnology, metagenomics, and bioinformatics.

摘要

高通量 DNA 测序技术可以解码大量微生物的蛋白质编码基因序列。然而,准确地将新基因序列的蛋白质功能进行分类仍然是一个挑战。为此,我们开发了 FunGeneTyper,这是一个具有两个新的深度学习模型(即 FunTrans 和 FunRep)、结构化数据库和支持资源的可扩展框架,可实现抗生素耐药基因(ARGs)和毒力因子基因的高度准确(Accuracy>0.99,F1 分数>0.97)和细粒度分类。使用包含远程同源序列的经实验验证的 ARG 数据集作为测试集,我们的框架在从人类肠道(F1 分数:0.6948)、废水(0.6072)和土壤(0.5445)微生物组中发现新的 ARG 方面实现了迄今为止最好的性能,击败了最先进的生物信息学工具和基于序列比对(F1 分数:0.0556-0.5065)和基于结构域(F1 分数:0.2630-0.5224)的注释方法。此外,我们的框架被实现为一个轻量级、保护隐私且即插即用的神经网络模块,促进了其多功能性和在全球开发人员和用户中的可访问性。我们预计 FunGeneTyper(https://github.com/emblab-westlake/FunGeneTyper)将被广泛用于精确分类蛋白质编码基因功能和发现大量有价值的酶。这一进展将对微生物组研究、生物技术、宏基因组学和生物信息学等各个领域产生重大影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/f3f9643f84cb/bbae319f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/906a6a04e94b/bbae319ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/ce2b31cafa53/bbae319f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/33d4ca492476/bbae319f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/4d8ec5965044/bbae319f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/c97282711a34/bbae319f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/f3f9643f84cb/bbae319f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/906a6a04e94b/bbae319ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/ce2b31cafa53/bbae319f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/33d4ca492476/bbae319f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/4d8ec5965044/bbae319f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/c97282711a34/bbae319f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/edc3/11247404/f3f9643f84cb/bbae319f5.jpg

相似文献

1
Highly accurate classification and discovery of microbial protein-coding gene functions using FunGeneTyper: an extensible deep learning framework.使用 FunGeneTyper 实现微生物蛋白编码基因功能的高精度分类和发现:一个可扩展的深度学习框架。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae319.
2
ARGNet: using deep neural networks for robust identification and classification of antibiotic resistance genes from sequences.ARGNet:利用深度神经网络从序列中进行稳健的抗生素耐药基因识别和分类。
Microbiome. 2024 May 9;12(1):84. doi: 10.1186/s40168-024-01805-0.
3
DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data.DeepARG:一种从宏基因组数据中预测抗生素耐药基因的深度学习方法。
Microbiome. 2018 Feb 1;6(1):23. doi: 10.1186/s40168-018-0401-z.
4
HMD-ARG: hierarchical multi-task deep learning for annotating antibiotic resistance genes.HMD-ARG:用于注释抗生素抗性基因的分层多任务深度学习
Microbiome. 2021 Feb 8;9(1):40. doi: 10.1186/s40168-021-01002-3.
5
Re-purposing software for functional characterization of the microbiome.重新利用软件对微生物组进行功能特征分析。
Microbiome. 2021 Jan 9;9(1):4. doi: 10.1186/s40168-020-00971-1.
6
DeepVF: a deep learning-based hybrid framework for identifying virulence factors using the stacking strategy.DeepVF:一种基于深度学习的混合框架,使用堆叠策略识别毒力因子。
Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa125.
7
Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network.使用基于注意力的深度神经网络学习、可视化和探索 16S rRNA 结构。
PLoS Comput Biol. 2021 Sep 22;17(9):e1009345. doi: 10.1371/journal.pcbi.1009345. eCollection 2021 Sep.
8
ARG-SHINE: improve antibiotic resistance class prediction by integrating sequence homology, functional information and deep convolutional neural network.ARG-SHINE:通过整合序列同源性、功能信息和深度卷积神经网络改进抗生素抗性类别预测。
NAR Genom Bioinform. 2021 Aug 5;3(3):lqab066. doi: 10.1093/nargab/lqab066. eCollection 2021 Sep.
9
ARGs-OAP v2.0 with an expanded SARG database and Hidden Markov Models for enhancement characterization and quantification of antibiotic resistance genes in environmental metagenomes.ARGs-OAP v2.0 版本,其 SARG 数据库得到了扩展,并采用隐马尔可夫模型来增强环境宏基因组中抗生素抗性基因的特征描述和定量分析。
Bioinformatics. 2018 Jul 1;34(13):2263-2270. doi: 10.1093/bioinformatics/bty053.
10
ARGs-OAP: online analysis pipeline for antibiotic resistance genes detection from metagenomic data using an integrated structured ARG-database.ARGs-OAP:一种使用集成的结构化 ARG 数据库从宏基因组数据中检测抗生素耐药基因的在线分析流程。
Bioinformatics. 2016 Aug 1;32(15):2346-51. doi: 10.1093/bioinformatics/btw136. Epub 2016 Mar 12.

本文引用的文献

1
Novel bacterial taxa in a minimal lignocellulolytic consortium and their potential for lignin and plastics transformation.最小木质纤维素分解菌群中的新型细菌分类群及其对木质素和塑料转化的潜力。
ISME Commun. 2022 Sep 26;2(1):89. doi: 10.1038/s43705-022-00176-7.
2
Enzyme function prediction using contrastive learning.使用对比学习进行酶功能预测。
Science. 2023 Mar 31;379(6639):1358-1363. doi: 10.1126/science.adf2465. Epub 2023 Mar 30.
3
Identification of antimicrobial peptides from the human gut microbiome using deep learning.利用深度学习从人类肠道微生物组中识别抗菌肽。
Nat Biotechnol. 2022 Jun;40(6):921-931. doi: 10.1038/s41587-022-01226-0. Epub 2022 Mar 3.
4
Using deep learning to annotate the protein universe.利用深度学习标注蛋白质宇宙。
Nat Biotechnol. 2022 Jun;40(6):932-937. doi: 10.1038/s41587-021-01179-w. Epub 2022 Feb 21.
5
Global burden of bacterial antimicrobial resistance in 2019: a systematic analysis.2019 年全球细菌对抗菌药物耐药性的负担:系统分析。
Lancet. 2022 Feb 12;399(10325):629-655. doi: 10.1016/S0140-6736(21)02724-0. Epub 2022 Jan 19.
6
Compendium of 530 metagenome-assembled bacterial and archaeal genomes from the polar Arctic Ocean.北极海洋 530 个宏基因组组装的细菌和古菌基因组汇编。
Nat Microbiol. 2021 Dec;6(12):1561-1574. doi: 10.1038/s41564-021-00979-9. Epub 2021 Nov 15.
7
ARG-SHINE: improve antibiotic resistance class prediction by integrating sequence homology, functional information and deep convolutional neural network.ARG-SHINE:通过整合序列同源性、功能信息和深度卷积神经网络改进抗生素抗性类别预测。
NAR Genom Bioinform. 2021 Aug 5;3(3):lqab066. doi: 10.1093/nargab/lqab066. eCollection 2021 Sep.
8
Learning the protein language: Evolution, structure, and function.学习蛋白质语言:进化、结构和功能。
Cell Syst. 2021 Jun 16;12(6):654-669.e3. doi: 10.1016/j.cels.2021.05.017.
9
Novel Soil-Derived Beta-Lactam, Chloramphenicol, Fosfomycin and Trimethoprim Resistance Genes Revealed by Functional Metagenomics.功能宏基因组学揭示的新型土壤来源β-内酰胺、氯霉素、磷霉素和甲氧苄啶抗性基因
Antibiotics (Basel). 2021 Apr 3;10(4):378. doi: 10.3390/antibiotics10040378.
10
The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言:自然语言处理、机器学习与蛋白质序列
Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.