Ponsero Alise J, Hurwitz Bonnie L
Department of Biosystems Engineering, The University of Arizona, Tucson, AZ, United States.
BIO5 Institute, The University of Arizona, Tucson, AZ, United States.
Front Microbiol. 2019 Apr 16;10:806. doi: 10.3389/fmicb.2019.00806. eCollection 2019.
Tools allowing for the identification of viral sequences in host-associated and environmental metagenomes allows for a better understanding of the genetics and ecology of viruses and their hosts. Recently, new approaches using machine learning methods to distinguish viral from bacterial signal using k-mer sequence signatures were published for identifying viral contigs in metagenomes. The promise of these content-based approaches is the ability to discover new viruses, with no or few known relatives. In this perspective paper, we examine the use of the content-based machine learning tool VirFinder for the identification of viral sequences in aquatic metagenomes and explore the possibility of using ecosystem-focused models targeted to marine metagenomes. We discuss the impact of the training set composition on the tool performance and the current limitation for the retrieval of low abundance viral sequences in metagenomes. We identify potential biases that could arise from machine learning approaches for viral hunting in real-world datasets and suggest possible avenues to overcome them.
能够在宿主相关和环境宏基因组中识别病毒序列的工具,有助于更好地理解病毒及其宿主的遗传学和生态学。最近,发表了一些使用机器学习方法,通过k-mer序列特征区分病毒信号与细菌信号,来识别宏基因组中病毒重叠群的新方法。这些基于内容的方法的前景在于能够发现新病毒,这些新病毒几乎没有已知的亲缘关系。在这篇观点论文中,我们研究了基于内容的机器学习工具VirFinder在识别水生宏基因组中病毒序列方面的应用,并探讨了使用针对海洋宏基因组的以生态系统为重点的模型的可能性。我们讨论了训练集组成对工具性能的影响,以及当前在宏基因组中检索低丰度病毒序列的局限性。我们识别了在现实世界数据集中进行病毒搜寻的机器学习方法可能产生的潜在偏差,并提出了克服这些偏差的可能途径。
Front Microbiol. 2019-4-16
Front Microbiol. 2021-5-21
BMC Bioinformatics. 2017-3-14
BMC Bioinformatics. 2016-1-16
IEEE/ACM Trans Comput Biol Bioinform. 2022
Microbiome Res Rep. 2023-7-20
G3 (Bethesda). 2024-11-6
Gigascience. 2024-1-2
Essays Biochem. 2023-8-11
Front Microbiol. 2023-1-25
Microbiol Mol Biol Rev. 2022-6-15
BMC Bioinformatics. 2018-9-24
Front Genet. 2018-8-7
Genome Res. 2016-12
PLoS One. 2016-9-29