phyBWT2：通过增强型Burrows-Wheeler变换位置聚类进行系统发育重建

phyBWT2: phylogeny reconstruction via eBWT positional clustering.

作者信息

Guerrini Veronica, Conte Alessio, Grossi Roberto, Liti Gianni, Rosone Giovanna, Tattini Lorenzo

机构信息

Dipartimento di Informatica, University of Pisa, Pisa, Italy.

CNRS UMR 7284, INSERM U1081 Université Côte d'Azu, Nice, France.

出版信息

Algorithms Mol Biol. 2023 Aug 3;18(1):11. doi: 10.1186/s13015-023-00232-4.

DOI:10.1186/s13015-023-00232-4

PMID:37537624

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10399073/

Abstract

BACKGROUND

Molecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data.

RESULTS

We present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23-12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter.

CONCLUSIONS

Based on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results.

摘要

背景

分子系统发育学通过生物序列研究种群个体之间的进化关系。它可以提供有关病毒性疾病起源和进化的见解，或突出复杂的进化轨迹。一项关键任务是从任何类型的测序数据（包括原始短读段）推断系统发育树。然而，一些工具需要预处理的输入数据，例如来自基于从头组装的复杂计算流程或与参考基因组的比对。随着测序技术不断变得更便宜，这给直接对其输出进行分析的方法设计带来了越来越大的压力。从这个角度来看，人们对能够处理包括原始读段数据在内的多种数据的无需比对、组装和参考的方法越来越感兴趣。

结果

我们展示了phyBWT2，它是phyBWT（Guerrini等人，第22届国际生物信息学算法研讨会（WABI）242:23 - 12319，2022）的新改进版本。它们都直接重建系统发育树，绕过了与参考基因组的比对和从头组装。它们利用扩展的Burrows - Wheeler变换（eBWT）的组合特性以及相应的eBWT位置聚类框架来检测不同长度的最长共享子串的相关块（与基于k - 元组的方法不同，后者需要先验固定长度k）。结果，它们提供了新颖的无需比对、组装和参考的方法来构建划分树，而不依赖于序列的成对比较，从而避免使用距离矩阵来推断系统发育。此外，phyBWT2在运行时间方面优于phyBWT，因为前者通过考虑多个划分逐步重建系统发育树，而不是像后者以前那样一次只考虑一个划分。

结论

基于对测序数据的实验结果，我们得出结论，我们的方法通过处理不同类型的数据集（短读段、重叠群或整个基因组）可以生成质量与基准系统发育相当的树。总体而言，实验证实了phyBWT

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9200/10399073/bbfc16b58f52/13015_2023_232_Fig1_HTML.jpg

相似文献

phyBWT2: phylogeny reconstruction via eBWT positional clustering.phyBWT2：通过增强型Burrows-Wheeler变换位置聚类进行系统发育重建

Algorithms Mol Biol. 2023 Aug 3;18(1):11. doi: 10.1186/s13015-023-00232-4.

SNPs detection by eBWT positional clustering.通过增强型Burrows-Wheeler变换（eBWT）位置聚类进行单核苷酸多态性（SNP）检测。

Algorithms Mol Biol. 2019 Feb 6;14:3. doi: 10.1186/s13015-019-0137-8. eCollection 2019.

An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data.一种从下一代测序数据中重建系统发育树的无需组装和比对的方法。

BMC Genomics. 2015 Jul 14;16(1):522. doi: 10.1186/s12864-015-1647-5.

Computing the original eBWT faster, simpler, and with less memory.更快、更简单且占用更少内存地计算原始增强型Burrows-Wheeler变换。

Int Symp String Process Inf Retr. 2021 Oct;12944:129-142. doi: 10.1007/978-3-030-86692-1_11. Epub 2021 Sep 27.

Erratum: Eyestalk Ablation to Increase Ovarian Maturation in Mud Crabs.勘误：切除眼柄以增加泥蟹的卵巢成熟度。

J Vis Exp. 2023 May 26(195). doi: 10.3791/6561.

VGEA: an RNA viral assembly toolkit.VGEA：一种RNA病毒组装工具包。

PeerJ. 2021 Sep 6;9:e12129. doi: 10.7717/peerj.12129. eCollection 2021.

Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree.使用 Read2Tree 从原始测序reads 直接推断系统发育树。

Nat Biotechnol. 2024 Jan;42(1):139-147. doi: 10.1038/s41587-023-01753-4. Epub 2023 Apr 20.

Metagenomic analysis through the extended Burrows-Wheeler transform.基于扩展的 Burrows-Wheeler 变换的宏基因组分析。

BMC Bioinformatics. 2020 Sep 16;21(Suppl 8):299. doi: 10.1186/s12859-020-03628-w.

On the quality of tree-based protein classification.论基于树的蛋白质分类的质量。

Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12.

Software for pre-processing Illumina next-generation sequencing short read sequences.用于预处理Illumina下一代测序短读序列的软件。

Source Code Biol Med. 2014 May 3;9:8. doi: 10.1186/1751-0473-9-8. eCollection 2014.

本文引用的文献

Computing the original eBWT faster, simpler, and with less memory.更快、更简单且占用更少内存地计算原始增强型Burrows-Wheeler变换。

Int Symp String Process Inf Retr. 2021 Oct;12944:129-142. doi: 10.1007/978-3-030-86692-1_11. Epub 2021 Sep 27.

Telomere-to-telomere assemblies of 142 strains characterize the genome structural landscape in Saccharomyces cerevisiae.142 株酿酒酵母的端粒到端粒组装描绘了基因组结构景观。

Nat Genet. 2023 Aug;55(8):1390-1399. doi: 10.1038/s41588-023-01459-y. Epub 2023 Jul 31.

SANS serif: alignment-free, whole-genome-based phylogenetic reconstruction.无衬线字体：无比对、全基因组的系统发育重建。

Bioinformatics. 2021 Dec 11;37(24):4868-4870. doi: 10.1093/bioinformatics/btab444.

Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.交互式生命树 (iTOL) v5：一个用于显示和注释系统发育树的在线工具。

Nucleic Acids Res. 2021 Jul 2;49(W1):W293-W296. doi: 10.1093/nar/gkab301.

A yeast living ancestor reveals the origin of genomic introgressions.酵母活祖先揭示了基因组渐渗的起源。

Nature. 2020 Nov;587(7834):420-425. doi: 10.1038/s41586-020-2889-1. Epub 2020 Nov 11.

gsufsort: constructing suffix arrays, LCP arrays and BWTs for string collections.gsufsort：为字符串集合构建后缀数组、最长公共前缀数组和Burrows-Wheeler变换

Algorithms Mol Biol. 2020 Sep 22;15:18. doi: 10.1186/s13015-020-00177-y. eCollection 2020.

Metagenomic analysis through the extended Burrows-Wheeler transform.基于扩展的 Burrows-Wheeler 变换的宏基因组分析。

BMC Bioinformatics. 2020 Sep 16;21(Suppl 8):299. doi: 10.1186/s12859-020-03628-w.

Variable-order reference-free variant discovery with the Burrows-Wheeler Transform.基于 Burrows-Wheeler 变换的变阶无参考变异发现。

BMC Bioinformatics. 2020 Sep 16;21(Suppl 8):260. doi: 10.1186/s12859-020-03586-3.

Information theoretic generalized Robinson-Foulds metrics for comparing phylogenetic trees.基于信息论的广义 Robinson-Foulds 度量在比较系统发生树中的应用。

Bioinformatics. 2020 Dec 22;36(20):5007-5013. doi: 10.1093/bioinformatics/btaa614.

IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.IQ-TREE 2：基因组时代系统发育推断的新模型和有效方法。

Mol Biol Evol. 2020 May 1;37(5):1530-1534. doi: 10.1093/molbev/msaa015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

phyBWT2：通过增强型Burrows-Wheeler变换位置聚类进行系统发育重建

phyBWT2: phylogeny reconstruction via eBWT positional clustering.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献