Suppr超能文献

DepoScope:使用大型语言模型进行准确的噬菌体解聚酶注释和结构域划定。

DepoScope: Accurate phage depolymerase annotation and domain delineation using large language models.

机构信息

Institute for Integrative Systems Biology (I2SysBio), Universitat de Valencia-CSIC, Paterna, Spain.

KERMIT, Department of Data Analysis and Mathematical Modelling, Ghent University, Ghent, Belgium.

出版信息

PLoS Comput Biol. 2024 Aug 5;20(8):e1011831. doi: 10.1371/journal.pcbi.1011831. eCollection 2024 Aug.

Abstract

Bacteriophages (phages) are viruses that infect bacteria. Many of them produce specific enzymes called depolymerases to break down external polysaccharide structures. Accurate annotation and domain identification of these depolymerases are challenging due to their inherent sequence diversity. Hence, we present DepoScope, a machine learning tool that combines a fine-tuned ESM-2 model with a convolutional neural network to identify depolymerase sequences and their enzymatic domains precisely. To accomplish this, we curated a dataset from the INPHARED phage genome database, created a polysaccharide-degrading domain database, and applied sequential filters to construct a high-quality dataset, which is subsequently used to train DepoScope. Our work is the first approach that combines sequence-level predictions with amino-acid-level predictions for accurate depolymerase detection and functional domain identification. In that way, we believe that DepoScope can greatly enhance our understanding of phage-host interactions at the level of depolymerases.

摘要

噬菌体(phages)是感染细菌的病毒。它们中的许多会产生特定的酶,称为解聚酶,以分解外部多糖结构。由于这些解聚酶具有固有的序列多样性,因此对它们进行准确的注释和结构域鉴定具有挑战性。因此,我们提出了 DepoScope,这是一种机器学习工具,它将经过微调的 ESM-2 模型与卷积神经网络相结合,以准确识别解聚酶序列及其酶结构域。为了实现这一目标,我们从 INPHARED 噬菌体基因组数据库中整理了一个数据集,创建了一个多糖降解结构域数据库,并应用顺序过滤器来构建一个高质量的数据集,然后用该数据集来训练 DepoScope。我们的工作是首次将序列级预测与氨基酸级预测相结合,以实现准确的解聚酶检测和功能结构域鉴定。通过这种方式,我们相信 DepoScope 可以极大地增强我们对噬菌体-宿主相互作用在解聚酶水平上的理解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40af/11326577/3141e2105da3/pcbi.1011831.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验