Suppr超能文献

使用最优树度量匹配的分支长度变换

Branch Length Transforms using Optimal Tree Metric Matching.

作者信息

Arasti Shayesteh, Tabaghi Puoya, Tabatabaee Yasamin, Mirarab Siavash

出版信息

bioRxiv. 2025 Mar 2:2023.11.13.566962. doi: 10.1101/2023.11.13.566962.

Abstract

UNLABELLED

The abundant discordance between evolutionary relationships across the genome has rekindled interest in methods for comparing and averaging trees on a shared leaf set. However, most attempts at comparing and matching trees have focused on tree topology. Comparing branch lengths has been more elusive due to several challenges. Species tree branch lengths can be measured in various units, often different from gene trees. Moreover, rates of evolution change across the genome, the species tree, and specific branches of gene trees. These factors compound the stochasticity of coalescence times and estimation noise, making branch lengths highly heterogeneous across the genome. For many downstream applications in phylogenomic analyses, branch lengths are as important as the topology, and yet, existing tools to compare and combine weighted trees are limited. In this paper, we address the question of matching one tree to another, accounting for their branch lengths. We define a series of computational problems called Topology-Constrained Metric Matching (TCMM) that seek to transform the branch lengths of a query tree based on a reference tree. We show that TCMM problems can be solved in quadratic time and memory using a linear algebraic formulation coupled with dynamic programming preprocessing. While many applications can be imagined for this framework, we explore two applications in this paper: embedding leaves of gene trees in Euclidean space to find outliers potentially indicative of errors and summarizing gene tree branch lengths onto the species tree. In these applications, our method, when paired with existing methods, increases their accuracy at limited computational expense.

CODE AND DATA AVAILABILITY

The software is available at https://github.com/shayesteh99/TCMM.git . Data is available on Github https://github.com/shayesteh99/TCMM-Data.git .

摘要

未标注

全基因组进化关系之间存在大量不一致,这重新引发了人们对在共享叶集上比较和平均树的方法的兴趣。然而,大多数比较和匹配树的尝试都集中在树的拓扑结构上。由于存在几个挑战,比较分支长度一直更加难以捉摸。物种树的分支长度可以用各种单位来衡量,这些单位通常与基因树的不同。此外,进化速率在全基因组、物种树以及基因树的特定分支中都会发生变化。这些因素加剧了合并时间的随机性和估计噪声,使得基因组中分支长度高度异质。在系统发育基因组分析的许多下游应用中,分支长度与拓扑结构同样重要,然而,现有的比较和组合加权树的工具却很有限。在本文中,我们解决了考虑分支长度将一棵树与另一棵树进行匹配的问题。我们定义了一系列称为拓扑约束度量匹配(TCMM)的计算问题,旨在根据参考树变换查询树的分支长度。我们表明,使用线性代数公式结合动态规划预处理,TCMM问题可以在二次时间和内存内解决。虽然可以想象这个框架有许多应用,但我们在本文中探索了两个应用:将基因树的叶嵌入欧几里得空间以找到可能指示错误的异常值,以及将基因树分支长度汇总到物种树上。在这些应用中,我们的方法与现有方法结合使用时,能以有限的计算成本提高其准确性。

代码和数据可用性

软件可在https://github.com/shayesteh99/TCMM.git获取。数据可在Github上的https://github.com/shayesteh99/TCMM-Data.git获取。

相似文献

3
Phylogenomic branch length estimation using quartets.基于四元组的系统发育分支长度估计
Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i185-i193. doi: 10.1093/bioinformatics/btad221.
4
The gene tree delusion.基因树错觉
Mol Phylogenet Evol. 2016 Jan;94(Pt A):1-33. doi: 10.1016/j.ympev.2015.07.018. Epub 2015 Jul 31.
8
QuCo: quartet-based co-estimation of species trees and gene trees.QuCo:基于四重奏的物种树和基因树的联合估计。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i413-i421. doi: 10.1093/bioinformatics/btac265.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验