• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

vClean:评估病毒基因组中的病毒序列污染情况。

vClean: assessing virus sequence contamination in viral genomes.

作者信息

Wagatsuma Ryota, Nishikawa Yohei, Hosokawa Masahito, Takeyama Haruko

机构信息

Department of Life Science and Medical Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, 2-2 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8480, Japan.

Computational Bio Big-Data Open Innovation Laboratory, AIST-Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-0072, Japan.

出版信息

NAR Genom Bioinform. 2025 Jan 7;7(1):lqae185. doi: 10.1093/nargab/lqae185. eCollection 2025 Mar.

DOI:10.1093/nargab/lqae185
PMID:39781513
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11704788/
Abstract

Recent advancements in viral metagenomics and single-virus genomics have improved our ability to obtain the draft genomes of environmental viruses. However, these methods can introduce virus sequence contaminations into viral genomes when short, fragmented partial sequences are present in the assembled contigs. These contaminations can lead to incorrect analyses; however, practical detection tools are lacking. In this study, we introduce vClean, a novel automated tool that detects contaminations in viral genomes. By applying machine learning to the nucleotide sequence features and gene patterns of the input viral genome, vClean could identify contaminations. Specifically, for tailed double-stranded DNA phages, we attempted accurate predictions by defining single-copy-like genes and counting their duplications. We evaluated the performance of vClean using simulated datasets derived from complete reference genomes, achieving a binary accuracy of 0.932. When vClean was applied to 4693 genomes of medium or higher quality derived from public ocean metagenomic data, 1604 genomes (34.2%) were identified as contaminated. We also demonstrated that vClean can detect contamination in single-virus genome data obtained from river water. vClean provides a new benchmark for quality control of environmental viral genomes and has the potential to become an essential tool for environmental viral genome analysis.

摘要

病毒宏基因组学和单病毒基因组学的最新进展提高了我们获取环境病毒基因组草图的能力。然而,当组装的重叠群中存在短的、片段化的部分序列时,这些方法可能会将病毒序列污染引入病毒基因组。这些污染可能导致错误的分析;然而,目前缺乏实用的检测工具。在本研究中,我们介绍了vClean,这是一种检测病毒基因组污染的新型自动化工具。通过将机器学习应用于输入病毒基因组的核苷酸序列特征和基因模式,vClean可以识别污染。具体而言,对于有尾双链DNA噬菌体,我们通过定义单拷贝样基因并计算其重复次数来尝试进行准确预测。我们使用从完整参考基因组衍生的模拟数据集评估了vClean的性能,二元准确率达到0.932。当vClean应用于从公共海洋宏基因组数据中获得的4693个中等或更高质量的基因组时,1604个基因组(34.2%)被鉴定为受污染。我们还证明了vClean可以检测从河水中获得的单病毒基因组数据中的污染。vClean为环境病毒基因组的质量控制提供了一个新的基准,并有潜力成为环境病毒基因组分析的重要工具。

相似文献

1
vClean: assessing virus sequence contamination in viral genomes.vClean:评估病毒基因组中的病毒序列污染情况。
NAR Genom Bioinform. 2025 Jan 7;7(1):lqae185. doi: 10.1093/nargab/lqae185. eCollection 2025 Mar.
2
VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data.VirFinder:一种新型的基于 k-mer 的工具,用于从组装的宏基因组数据中识别病毒序列。
Microbiome. 2017 Jul 6;5(1):69. doi: 10.1186/s40168-017-0283-5.
3
Large-scale single-virus genomics uncovers hidden diversity of river water viruses and diversified gene profiles.大规模单病毒基因组学揭示了河流水病毒的隐藏多样性和多样化的基因谱。
ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae124.
4
VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences.VIBRANT:从基因组序列中自动恢复、注释和培养微生物病毒,并评估病毒群落功能。
Microbiome. 2020 Jun 10;8(1):90. doi: 10.1186/s40168-020-00867-0.
5
FastViromeExplorer-Novel: Recovering Draft Genomes of Novel Viruses and Phages in Metagenomic Data.FastViromeExplorer-Novel:从宏基因组数据中恢复新型病毒和噬菌体的草图基因组。
J Comput Biol. 2023 Apr;30(4):391-408. doi: 10.1089/cmb.2022.0397. Epub 2023 Jan 6.
6
ViraMiner: Deep learning on raw DNA sequences for identifying viral genomes in human samples.ViraMiner:在原始 DNA 序列上进行深度学习,以鉴定人类样本中的病毒基因组。
PLoS One. 2019 Sep 11;14(9):e0222271. doi: 10.1371/journal.pone.0222271. eCollection 2019.
7
Machine Learning for detection of viral sequences in human metagenomic datasets.基于机器学习的人类宏基因组数据中病毒序列检测
BMC Bioinformatics. 2018 Sep 24;19(1):336. doi: 10.1186/s12859-018-2340-x.
8
ViraLM: empowering virus discovery through the genome foundation model.ViraLM:通过基因组基础模型助力病毒发现
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae704.
9
Removing contaminants from databases of draft genomes.从基因组草案数据库中去除污染物。
PLoS Comput Biol. 2018 Jun 25;14(6):e1006277. doi: 10.1371/journal.pcbi.1006277. eCollection 2018 Jun.
10
Extraordinary diversity of viruses in deep-sea sediments as revealed by metagenomics without prior virion separation.宏基因组学揭示了深海沉积物中病毒的非凡多样性,无需事先分离病毒粒子。
Environ Microbiol. 2021 Feb;23(2):728-743. doi: 10.1111/1462-2920.15154. Epub 2020 Aug 3.

本文引用的文献

1
Large-scale single-virus genomics uncovers hidden diversity of river water viruses and diversified gene profiles.大规模单病毒基因组学揭示了河流水病毒的隐藏多样性和多样化的基因谱。
ISME J. 2024 Jan 8;18(1). doi: 10.1093/ismejo/wrae124.
2
ViWrap: A modular pipeline to identify, bin, classify, and predict viral-host relationships for viruses from metagenomes.ViWrap:一种用于从宏基因组中识别、分类、归类和预测病毒与宿主关系的模块化流程。
Imeta. 2023 Aug;2(3). doi: 10.1002/imt2.118. Epub 2023 Jun 7.
3
Identification of mobile genetic elements with geNomad.
使用 geNomad 识别移动遗传元件。
Nat Biotechnol. 2024 Aug;42(8):1303-1312. doi: 10.1038/s41587-023-01953-y. Epub 2023 Sep 21.
4
CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning.CheckM2:一种使用机器学习快速、可扩展且准确评估微生物基因组质量的工具。
Nat Methods. 2023 Aug;20(8):1203-1212. doi: 10.1038/s41592-023-01940-w. Epub 2023 Jul 27.
5
Bacteriophages as Biotechnological Tools.噬菌体作为生物技术工具。
Viruses. 2023 Jan 26;15(2):349. doi: 10.3390/v15020349.
6
IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata.IMG/VR v4:一个扩展的未培养病毒基因组数据库,其中包含广泛的功能、分类和生态元数据框架。
Nucleic Acids Res. 2023 Jan 6;51(D1):D733-D743. doi: 10.1093/nar/gkac1037.
7
vRhyme enables binning of viral genomes from metagenomes.vRhyme 能够对宏基因组中的病毒基因组进行分类。
Nucleic Acids Res. 2022 Aug 12;50(14):e83. doi: 10.1093/nar/gkac341.
8
Genome binning of viral entities from bulk metagenomics data.宏基因组数据中病毒类群的基因组分箱。
Nat Commun. 2022 Feb 18;13(1):965. doi: 10.1038/s41467-022-28581-5.
9
Contamination Issue in Viral Metagenomics: Problems, Solutions, and Clinical Perspectives.病毒宏基因组学中的污染问题:问题、解决方案及临床视角
Front Microbiol. 2021 Oct 20;12:745076. doi: 10.3389/fmicb.2021.745076. eCollection 2021.
10
New technologies for developing phage-based tools to manipulate the human microbiome.用于开发基于噬菌体的工具来操纵人类微生物组的新技术。
Trends Microbiol. 2022 Feb;30(2):131-142. doi: 10.1016/j.tim.2021.04.007. Epub 2021 May 18.