Laboratory of Biotechnology (MedBiotech), Rabat Medical & Pharmacy School, Bioinova Research Center, Mohammed V University in Rabat, Rabat, Morocco.
Mohammed VI Center for Research and Innovation (CM6), Rabat, Morocco.
BMC Bioinformatics. 2024 Nov 27;25(1):367. doi: 10.1186/s12859-024-05992-3.
Genomic sequence similarity comparison is a crucial research area in bioinformatics. Multiple Sequence Alignment (MSA) is the basic technique used to identify regions of similarity between sequences, although MSA tools are widely used and highly accurate, they are often limited by computational complexity, and inaccuracies when handling highly divergent sequences, which leads to the development of alignment-free (AF) algorithms.
This paper presents TreeWave, a novel AF approach based on frequency chaos game representation and discrete wavelet transform of sequences for phylogeny inference. We validate our method on various genomic datasets such as complete virus genome sequences, bacteria genome sequences, human mitochondrial genome sequences, and rRNA gene sequences. Compared to classical methods, our tool demonstrates a significant reduction in running time, especially when analyzing large datasets. The resulting phylogenetic trees show that TreeWave has similar classification accuracy to the classical MSA methods based on the normalized Robinson-Foulds distances and Baker's Gamma coefficients.
TreeWave is an open source and user-friendly command line tool for phylogeny reconstruction. It is a faster and more scalable tool that prioritizes computational efficiency while maintaining accuracy. TreeWave is freely available at https://github.com/nasmaB/TreeWave .
基因组序列相似性比较是生物信息学的一个重要研究领域。多序列比对(MSA)是用于识别序列之间相似区域的基本技术,尽管 MSA 工具被广泛使用且高度准确,但它们通常受到计算复杂性的限制,并且在处理高度变异的序列时会出现不准确的情况,这导致了无比对(AF)算法的发展。
本文提出了 TreeWave,这是一种基于序列的频率混沌游戏表示和离散小波变换的新型 AF 方法,用于系统发育推断。我们在各种基因组数据集上验证了我们的方法,例如完整病毒基因组序列、细菌基因组序列、人类线粒体基因组序列和 rRNA 基因序列。与经典方法相比,我们的工具在运行时间方面有显著的减少,尤其是在分析大型数据集时。生成的系统发育树表明,TreeWave 与基于归一化罗宾逊-福尔德距离和贝克的伽马系数的经典 MSA 方法具有相似的分类准确性。
TreeWave 是一种用于系统发育重建的开源且用户友好的命令行工具。它是一种更快、更具可扩展性的工具,在保持准确性的同时优先考虑计算效率。TreeWave 可在 https://github.com/nasmaB/TreeWave 上免费获得。