• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用人工智能改进病毒注释。

Improving viral annotation with artificial intelligence.

机构信息

Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, USA.

Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, New York, USA.

出版信息

mBio. 2024 Oct 16;15(10):e0320623. doi: 10.1128/mbio.03206-23. Epub 2024 Sep 4.

DOI:10.1128/mbio.03206-23
PMID:39230289
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11481560/
Abstract

Viruses of bacteria, "phages," are fundamental, poorly understood components of microbial community structure and function. Additionally, their dependence on hosts for replication positions phages as unique sensors of ecosystem features and environmental pressures. High-throughput sequencing approaches have begun to give us access to the diversity and range of phage populations in complex microbial community samples, and metagenomics is currently the primary tool with which we study phage populations. The study of phages by metagenomic sequencing, however, is fundamentally limited by viral diversity, which results in the vast majority of viral genomes and metagenome-annotated genomes lacking annotation. To harness bacteriophages for applications in human and environmental health and disease, we need new methods to organize and annotate viral sequence diversity. We recently demonstrated that methods that leverage self-supervised representation learning can supplement statistical sequence representations for remote viral protein homology detection in the ocean virome and propose that consideration of the functional content of viral sequences allows for the identification of similarity in otherwise sequence-diverse viruses and viral-like elements for biological discovery. In this review, we describe the potential and pitfalls of large language models for viral annotation. We describe the need for new approaches to annotate viral sequences in metagenomes, the fundamentals of what protein language models are and how one can use them for sequence annotation, the strengths and weaknesses of these models, and future directions toward developing better models for viral annotation more broadly.

摘要

细菌病毒,即“噬菌体”,是微生物群落结构和功能的基本组成部分,但其作用尚未被充分理解。此外,噬菌体的复制依赖于宿主,这使它们成为独特的生态系统特征和环境压力的感应器。高通量测序方法已经开始让我们能够深入了解复杂微生物群落样本中噬菌体的多样性和范围,而宏基因组学目前是我们研究噬菌体种群的主要工具。然而,通过宏基因组测序研究噬菌体受到病毒多样性的根本限制,这导致绝大多数病毒基因组和宏基因组注释基因组缺乏注释。为了利用噬菌体在人类和环境健康和疾病方面的应用,我们需要新的方法来组织和注释病毒序列多样性。我们最近证明,利用自我监督表示学习的方法可以补充海洋病毒组中远程病毒蛋白同源性检测的统计序列表示,并提出考虑病毒序列的功能内容可以识别在其他方面序列不同的病毒和病毒样元件,以进行生物发现。在这篇综述中,我们描述了大型语言模型在病毒注释方面的潜力和陷阱。我们描述了在宏基因组中注释病毒序列的新方法的必要性,介绍了蛋白质语言模型的基本原理以及如何将其用于序列注释,讨论了这些模型的优缺点,以及更广泛地开发更好的病毒注释模型的未来方向。

相似文献

1
Improving viral annotation with artificial intelligence.利用人工智能改进病毒注释。
mBio. 2024 Oct 16;15(10):e0320623. doi: 10.1128/mbio.03206-23. Epub 2024 Sep 4.
2
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences.VIBRANT:从基因组序列中自动恢复、注释和培养微生物病毒,并评估病毒群落功能。
Microbiome. 2020 Jun 10;8(1):90. doi: 10.1186/s40168-020-00867-0.
3
Assembly and Annotation of Viral Metagenomes from Short-Read Sequencing Data.病毒宏基因组的短读测序数据组装与注释。
Methods Mol Biol. 2023;2649:317-337. doi: 10.1007/978-1-0716-3072-3_17.
4
Microbial Diversity and Phage-Host Interactions in the Georgian Coastal Area of the Black Sea Revealed by Whole Genome Metagenomic Sequencing.通过全基因组宏基因组测序揭示黑海格鲁吉亚沿海地区的微生物多样性和噬菌体-宿主相互作用。
Mar Drugs. 2020 Nov 14;18(11):558. doi: 10.3390/md18110558.
5
Seeker: alignment-free identification of bacteriophage genomes by deep learning.基于深度学习的噬菌体基因组无比对识别
Nucleic Acids Res. 2020 Dec 2;48(21):e121. doi: 10.1093/nar/gkaa856.
6
Long-Read Metagenomics Improves the Recovery of Viral Diversity from Complex Natural Marine Samples.长读宏基因组提高了从复杂自然海洋样本中病毒多样性的恢复。
mSystems. 2022 Jun 28;7(3):e0019222. doi: 10.1128/msystems.00192-22. Epub 2022 Jun 13.
7
Computational approaches to predict bacteriophage-host relationships.预测噬菌体-宿主关系的计算方法。
FEMS Microbiol Rev. 2016 Mar;40(2):258-72. doi: 10.1093/femsre/fuv048. Epub 2015 Dec 9.
8
Isolation of a Host-Confined Phage Metagenome Allows the Detection of Phages Both Capable and Incapable of Plaque Formation.宿主限制噬菌体宏基因组的分离允许检测既能形成噬菌斑又不能形成噬菌斑的噬菌体。
Methods Mol Biol. 2023;2555:195-203. doi: 10.1007/978-1-0716-2795-2_14.
9
Fishing for phages in metagenomes: what do we catch, what do we miss?宏基因组中噬菌体的钓取:我们能捞到什么,又会错过什么?
Curr Opin Virol. 2021 Aug;49:142-150. doi: 10.1016/j.coviro.2021.05.008. Epub 2021 Jun 15.
10
Genome binning of viral entities from bulk metagenomics data.宏基因组数据中病毒类群的基因组分箱。
Nat Commun. 2022 Feb 18;13(1):965. doi: 10.1038/s41467-022-28581-5.

引用本文的文献

1
Automated Annotation and Validation of Human Respiratory Virus Sequences using VADR.使用VADR对人类呼吸道病毒序列进行自动注释和验证
bioRxiv. 2025 Aug 11:2025.08.07.669219. doi: 10.1101/2025.08.07.669219.
2
Fine-Tuning Protein Language Models Unlocks the Potential of Underrepresented Viral Proteomes.微调蛋白质语言模型可释放未充分表征的病毒蛋白质组的潜力。
bioRxiv. 2025 Jun 11:2025.04.17.649224. doi: 10.1101/2025.04.17.649224.
3
Optimizing phage therapy with artificial intelligence: a perspective.利用人工智能优化噬菌体疗法:一种观点。
Front Cell Infect Microbiol. 2025 May 27;15:1611857. doi: 10.3389/fcimb.2025.1611857. eCollection 2025.