• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

TreeWave:基于 DNA 序列图形表示和基因组信号处理的无比对系统发育重建命令行工具。

TreeWave: command line tool for alignment-free phylogeny reconstruction based on graphical representation of DNA sequences and genomic signal processing.

机构信息

Laboratory of Biotechnology (MedBiotech), Rabat Medical & Pharmacy School, Bioinova Research Center, Mohammed V University in Rabat, Rabat, Morocco.

Mohammed VI Center for Research and Innovation (CM6), Rabat, Morocco.

出版信息

BMC Bioinformatics. 2024 Nov 27;25(1):367. doi: 10.1186/s12859-024-05992-3.

DOI:10.1186/s12859-024-05992-3
PMID:39604838
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11600722/
Abstract

BACKGROUND

Genomic sequence similarity comparison is a crucial research area in bioinformatics. Multiple Sequence Alignment (MSA) is the basic technique used to identify regions of similarity between sequences, although MSA tools are widely used and highly accurate, they are often limited by computational complexity, and inaccuracies when handling highly divergent sequences, which leads to the development of alignment-free (AF) algorithms.

RESULTS

This paper presents TreeWave, a novel AF approach based on frequency chaos game representation and discrete wavelet transform of sequences for phylogeny inference. We validate our method on various genomic datasets such as complete virus genome sequences, bacteria genome sequences, human mitochondrial genome sequences, and rRNA gene sequences. Compared to classical methods, our tool demonstrates a significant reduction in running time, especially when analyzing large datasets. The resulting phylogenetic trees show that TreeWave has similar classification accuracy to the classical MSA methods based on the normalized Robinson-Foulds distances and Baker's Gamma coefficients.

CONCLUSIONS

TreeWave is an open source and user-friendly command line tool for phylogeny reconstruction. It is a faster and more scalable tool that prioritizes computational efficiency while maintaining accuracy. TreeWave is freely available at https://github.com/nasmaB/TreeWave .

摘要

背景

基因组序列相似性比较是生物信息学的一个重要研究领域。多序列比对(MSA)是用于识别序列之间相似区域的基本技术,尽管 MSA 工具被广泛使用且高度准确,但它们通常受到计算复杂性的限制,并且在处理高度变异的序列时会出现不准确的情况,这导致了无比对(AF)算法的发展。

结果

本文提出了 TreeWave,这是一种基于序列的频率混沌游戏表示和离散小波变换的新型 AF 方法,用于系统发育推断。我们在各种基因组数据集上验证了我们的方法,例如完整病毒基因组序列、细菌基因组序列、人类线粒体基因组序列和 rRNA 基因序列。与经典方法相比,我们的工具在运行时间方面有显著的减少,尤其是在分析大型数据集时。生成的系统发育树表明,TreeWave 与基于归一化罗宾逊-福尔德距离和贝克的伽马系数的经典 MSA 方法具有相似的分类准确性。

结论

TreeWave 是一种用于系统发育重建的开源且用户友好的命令行工具。它是一种更快、更具可扩展性的工具,在保持准确性的同时优先考虑计算效率。TreeWave 可在 https://github.com/nasmaB/TreeWave 上免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/886b8fc06090/12859_2024_5992_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/bea3fef60c44/12859_2024_5992_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/6cd9a388369f/12859_2024_5992_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/542d61fd77fe/12859_2024_5992_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/72ab3d00eefe/12859_2024_5992_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/84d002095fde/12859_2024_5992_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/762471ce7c96/12859_2024_5992_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/d04b14c135c2/12859_2024_5992_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/9a73bfba174c/12859_2024_5992_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/886b8fc06090/12859_2024_5992_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/bea3fef60c44/12859_2024_5992_Figa_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/6cd9a388369f/12859_2024_5992_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/542d61fd77fe/12859_2024_5992_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/72ab3d00eefe/12859_2024_5992_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/84d002095fde/12859_2024_5992_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/762471ce7c96/12859_2024_5992_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/d04b14c135c2/12859_2024_5992_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/9a73bfba174c/12859_2024_5992_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/af95/11600722/886b8fc06090/12859_2024_5992_Fig8_HTML.jpg

相似文献

1
TreeWave: command line tool for alignment-free phylogeny reconstruction based on graphical representation of DNA sequences and genomic signal processing.TreeWave:基于 DNA 序列图形表示和基因组信号处理的无比对系统发育重建命令行工具。
BMC Bioinformatics. 2024 Nov 27;25(1):367. doi: 10.1186/s12859-024-05992-3.
2
ML-DSP: Machine Learning with Digital Signal Processing for ultrafast, accurate, and scalable genome classification at all taxonomic levels.ML-DSP:利用数字信号处理进行机器学习,实现了在所有分类学水平上的超快、准确和可扩展的基因组分类。
BMC Genomics. 2019 Apr 3;20(1):267. doi: 10.1186/s12864-019-5571-y.
3
An improved model for whole genome phylogenetic analysis by Fourier transform.一种通过傅里叶变换进行全基因组系统发育分析的改进模型。
J Theor Biol. 2015 Oct 7;382:99-110. doi: 10.1016/j.jtbi.2015.06.033. Epub 2015 Jul 4.
4
Efficient TF-IDF method for alignment-free DNA sequence similarity analysis.用于无比对DNA序列相似性分析的高效词频逆文档频率方法。
J Mol Graph Model. 2025 Jun;137:109011. doi: 10.1016/j.jmgm.2025.109011. Epub 2025 Mar 15.
5
kmer2vec: A Novel Method for Comparing DNA Sequences by word2vec Embedding.kmer2vec:一种基于 word2vec 嵌入的 DNA 序列比较新方法。
J Comput Biol. 2022 Sep;29(9):1001-1021. doi: 10.1089/cmb.2021.0536. Epub 2022 May 20.
6
KINN: An alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences.KINN:一种基于生物序列中k-mer对的内部距离分布的无比对精确系统发育重建方法。
Mol Phylogenet Evol. 2023 Feb;179:107662. doi: 10.1016/j.ympev.2022.107662. Epub 2022 Nov 11.
7
A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering.一种通过傅里叶变换衡量DNA序列相似性及其在层次聚类中的应用
J Theor Biol. 2014 Oct 21;359:18-28. doi: 10.1016/j.jtbi.2014.05.043. Epub 2014 Jun 6.
8
SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning.SPARK-MSNA:基于 Apache Spark 的高效算法,用于通过有监督学习对齐多个相似的 DNA/RNA 序列。
Sci Rep. 2019 Apr 29;9(1):6631. doi: 10.1038/s41598-019-42966-5.
9
CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences.CGRclust:用于未标记DNA序列双对比聚类的混沌游戏表示法
BMC Genomics. 2024 Dec 18;25(1):1214. doi: 10.1186/s12864-024-11135-y.
10
Whole genome/proteome based phylogeny reconstruction for prokaryotes using higher order Markov model and chaos game representation.使用高阶马尔可夫模型和混沌博弈表示法对原核生物进行基于全基因组/蛋白质组的系统发育重建。
Mol Phylogenet Evol. 2016 Mar;96:102-111. doi: 10.1016/j.ympev.2015.12.011. Epub 2015 Dec 24.