Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Victoria, Australia.
Faculty of Information Technology, Monash University, Victoria, Australia.
Sci Rep. 2023 Oct 31;13(1):18662. doi: 10.1038/s41598-023-45461-0.
The emergence of viruses and their variants has made virus taxonomy more important than ever before in controlling the spread of diseases. The creation of efficient treatments and cures that target particular virus properties can be aided by understanding virus taxonomy. Alignment-based methods are commonly used for this task, but are computationally expensive and time-consuming, especially when dealing with large datasets or when detecting new virus variants is time sensitive. An alternative approach, the encoded method, has been developed that does not require prior sequence alignment and provides faster results. However, each encoded method has its own claimed accuracy. Therefore, careful evaluation and comparison of the performance of different encoded methods are essential to identify the most accurate and reliable approach for virus taxonomy classification. This study aims to address this issue by providing a comprehensive and comparative analysis of the potential of encoded methods for virus classification and phylogenetics. We compared the vectors generated for each encoded method using distance metrics to determine their similarity to alignment-based methods. The results and their validation show that K-merNV followed by CgrDft encoded methods, perform similarly to state-of-the-art multi-sequence alignment methods. This is the first study to incorporate and compare encoded methods that will facilitate future research in making more informed decisions regarding selection of a suitable method for virus taxonomy.
病毒及其变体的出现使得病毒分类比以往任何时候都更加重要,因为这有助于控制疾病的传播。通过了解病毒分类学,可以帮助我们针对特定病毒特性创建高效的治疗和治愈方法。基于比对的方法通常用于完成这项任务,但这些方法计算成本高、耗时,尤其是在处理大型数据集或需要快速检测新病毒变体时。因此,人们开发了一种替代方法,即编码方法,这种方法不需要事先进行序列比对,并且可以更快地得到结果。但是,每种编码方法都有其声称的准确性。因此,为了确定病毒分类学分类最准确和可靠的方法,必须仔细评估和比较不同编码方法的性能。本研究旨在通过全面比较编码方法在病毒分类和系统发生学中的潜力来解决这个问题。我们使用距离度量标准比较了每种编码方法生成的向量,以确定它们与基于比对方法的相似性。结果及其验证表明,K-merNV 加 CgrDft 编码方法的性能与最先进的多序列比对方法相似。这是第一项综合比较编码方法的研究,这将有助于未来在选择合适的病毒分类方法时做出更明智的决策。