文献检索，用中文搜 PubMed

High-throughput DNA sequencing technologies decode tremendous amounts of microbial protein-coding gene sequences. However, accurately assigning protein functions to novel gene sequences remain a challenge. To this end, we developed FunGeneTyper, an extensible framework with two new deep learning models (i.e., FunTrans and FunRep), structured databases, and supporting resources for achieving highly accurate (Accuracy > 0.99, F1-score > 0.97) and fine-grained classification of antibiotic resistance genes (ARGs) and virulence factor genes. Using an experimentally confirmed dataset of ARGs comprising remote homologous sequences as the test set, our framework achieves by-far-the-best performance in the discovery of new ARGs from human gut (F1-score: 0.6948), wastewater (0.6072), and soil (0.5445) microbiomes, beating the state-of-the-art bioinformatics tools and sequence alignment-based (F1-score: 0.0556-0.5065) and domain-based (F1-score: 0.2630-0.5224) annotation approaches. Furthermore, our framework is implemented as a lightweight, privacy-preserving, and plug-and-play neural network module, facilitating its versatility and accessibility to developers and users worldwide. We anticipate widespread utilization of FunGeneTyper (https://github.com/emblab-westlake/FunGeneTyper) for precise classification of protein-coding gene functions and the discovery of numerous valuable enzymes. This advancement will have a significant impact on various fields, including microbiome research, biotechnology, metagenomics, and bioinformatics.

高通量 DNA 测序技术可以解码大量微生物的蛋白质编码基因序列。然而，准确地将新基因序列的蛋白质功能进行分类仍然是一个挑战。为此，我们开发了 FunGeneTyper，这是一个具有两个新的深度学习模型（即 FunTrans 和 FunRep）、结构化数据库和支持资源的可扩展框架，可实现抗生素耐药基因（ARGs）和毒力因子基因的高度准确（Accuracy>0.99，F1 分数>0.97）和细粒度分类。使用包含远程同源序列的经实验验证的 ARG 数据集作为测试集，我们的框架在从人类肠道（F1 分数：0.6948）、废水（0.6072）和土壤（0.5445）微生物组中发现新的 ARG 方面实现了迄今为止最好的性能，击败了最先进的生物信息学工具和基于序列比对（F1 分数：0.0556-0.5065）和基于结构域（F1 分数：0.2630-0.5224）的注释方法。此外，我们的框架被实现为一个轻量级、保护隐私且即插即用的神经网络模块，促进了其多功能性和在全球开发人员和用户中的可访问性。我们预计 FunGeneTyper（https://github.com/emblab-westlake/FunGeneTyper）将被广泛用于精确分类蛋白质编码基因功能和发现大量有价值的酶。这一进展将对微生物组研究、生物技术、宏基因组学和生物信息学等各个领域产生重大影响。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

使用 FunGeneTyper 实现微生物蛋白编码基因功能的高精度分类和发现：一个可扩展的深度学习框架。

Highly accurate classification and discovery of microbial protein-coding gene functions using FunGeneTyper: an extensible deep learning framework.

机构信息

出版信息

相似文献