Department of Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, 999077, China SAR.
Bioinformatics. 2024 Oct 1;40(10). doi: 10.1093/bioinformatics/btae575.
RNA viruses are ubiquitous across a broad spectrum of ecosystems. Therefore, beyond their significant implications for public health, RNA viruses are also key players in ecological processes. High-through sequencing has accelerated the discovery of RNA viruses. Nevertheless, many of these viruses lack taxonomic annotation, posing a challenge to functional inference and evolutionary study. In particular, virus classification at the genus level remains difficult due to the limited reference data and ambiguous boundaries between some closely related genera. We introduce VirTAXA, a robust classification tool that combines remote homology search and tree-based validation to enhance the genus-level taxonomic classification of RNA viruses. VirTAXA is able to predict the genus label of an assembled viral contig and provide evidence type for each prediction. It achieves comparable accuracy to state-of-the-art methods while assigning genus labels to a greater number of sequences. Specifically, on the Global Ocean RNA metatranscriptomic data, VirTAXA can assign genus labels for 18% more contigs than the second-best classification tool. Furthermore, we demonstrated that VirTAXA can be conveniently extended to other types of viruses.
The source code and data of VirTAXA are available via https://github.com/JudithEllyn/VirTAXA.
RNA 病毒广泛存在于各种生态系统中。因此,除了对公共卫生有重大影响外,RNA 病毒还是生态过程中的关键参与者。高通量测序加速了 RNA 病毒的发现。然而,许多这些病毒缺乏分类注释,这对功能推断和进化研究构成了挑战。特别是,由于有限的参考数据和一些密切相关属之间的边界模糊,病毒属级别的分类仍然具有挑战性。我们引入了 VirTAXA,这是一种强大的分类工具,它结合了远程同源搜索和基于树的验证,以增强 RNA 病毒的属级分类。VirTAXA 能够预测组装病毒序列的属标签,并为每个预测提供证据类型。它在分配属标签方面的准确性可与最先进的方法相媲美,同时可以为更多的序列分配属标签。具体来说,在全球海洋 RNA 宏转录组数据上,VirTAXA 可以为比第二个最佳分类工具多 18%的序列分配属标签。此外,我们证明了 VirTAXA 可以方便地扩展到其他类型的病毒。
VirTAXA 的源代码和数据可通过 https://github.com/JudithEllyn/VirTAXA 获得。