Suppr超能文献

MArVD2:一种用于在病毒数据集中区分古菌病毒和细菌病毒的机器学习增强工具。

MArVD2: a machine learning enhanced tool to discriminate between archaeal and bacterial viruses in viral datasets.

作者信息

Vik Dean, Bolduc Benjamin, Roux Simon, Sun Christine L, Pratama Akbar Adjie, Krupovic Mart, Sullivan Matthew B

机构信息

Department of Microbiology, The Ohio State University, Columbus, OH, 43210, USA.

Center of Microbiome Science, The Ohio State University, Columbus, OH, USA.

出版信息

ISME Commun. 2023 Aug 24;3(1):87. doi: 10.1038/s43705-023-00295-9.

Abstract

Our knowledge of viral sequence space has exploded with advancing sequencing technologies and large-scale sampling and analytical efforts. Though archaea are important and abundant prokaryotes in many systems, our knowledge of archaeal viruses outside of extreme environments is limited. This largely stems from the lack of a robust, high-throughput, and systematic way to distinguish between bacterial and archaeal viruses in datasets of curated viruses. Here we upgrade our prior text-based tool (MArVD) via training and testing a random forest machine learning algorithm against a newly curated dataset of archaeal viruses. After optimization, MArVD2 presented a significant improvement over its predecessor in terms of scalability, usability, and flexibility, and will allow user-defined custom training datasets as archaeal virus discovery progresses. Benchmarking showed that a model trained with viral sequences from the hypersaline, marine, and hot spring environments correctly classified 85% of the archaeal viruses with a false detection rate below 2% using a random forest prediction threshold of 80% in a separate benchmarking dataset from the same habitats.

摘要

随着测序技术的进步以及大规模采样和分析工作的开展,我们对病毒序列空间的了解呈爆发式增长。尽管古菌在许多系统中是重要且丰富的原核生物,但我们对极端环境之外的古菌病毒的了解有限。这在很大程度上源于在经过整理的病毒数据集中缺乏一种强大、高通量且系统的方法来区分细菌病毒和古菌病毒。在此,我们通过针对一个新整理的古菌病毒数据集训练和测试随机森林机器学习算法,对我们之前基于文本的工具(MArVD)进行了升级。经过优化后,MArVD2在可扩展性、可用性和灵活性方面比其前身有了显著改进,并且随着古菌病毒发现工作的推进,将允许用户定义自定义训练数据集。基准测试表明,在来自相同栖息地的单独基准测试数据集中,使用80%的随机森林预测阈值,用来自高盐、海洋和温泉环境的病毒序列训练的模型能够正确分类85%的古菌病毒,错误检测率低于2%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fda0/10449787/a85e325f4267/43705_2023_295_Fig1_HTML.jpg

相似文献

2
Putative archaeal viruses from the mesopelagic ocean.来自海洋中层的假定古病毒。
PeerJ. 2017 Jun 15;5:e3428. doi: 10.7717/peerj.3428. eCollection 2017.
5
Archaeal Viruses from High-Temperature Environments.来自高温环境的古菌病毒。
Genes (Basel). 2018 Feb 27;9(3):128. doi: 10.3390/genes9030128.
6
Diverse viruses of marine archaea discovered using metagenomics.利用宏基因组学发现海洋古菌的多种病毒。
Environ Microbiol. 2023 Feb;25(2):367-382. doi: 10.1111/1462-2920.16287. Epub 2022 Nov 24.

本文引用的文献

9
Genome-resolved viral ecology in a marine oxygen minimum zone.海洋缺氧区的基因组解析病毒生态学。
Environ Microbiol. 2021 Jun;23(6):2858-2874. doi: 10.1111/1462-2920.15313. Epub 2020 Nov 23.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验