Suppr超能文献

分类群添加后的系统发育树不稳定性:在线推断的经验频率、可预测性及后果

Phylogenetic Tree Instability After Taxon Addition: Empirical Frequency, Predictability, and Consequences For Online Inference.

作者信息

Collienne Lena, Barker Mary, Suchard Marc A, Matsen Frederick A

机构信息

Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109, USA.

Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave N, Seattle, WA 98109, USA.

出版信息

Syst Biol. 2025 Feb 10;74(1):101-111. doi: 10.1093/sysbio/syae059.

Abstract

Online phylogenetic inference methods add sequentially arriving sequences to an inferred phylogeny without the need to recompute the entire tree from scratch. Some online method implementations exist already, but there remains concern that additional sequences may change the topological relationship among the original set of taxa. We call such a change in tree topology a lack of stability for the inferred tree. In this article, we analyze the stability of single taxon addition in a Maximum Likelihood framework across 1000 empirical datasets. We find that instability occurs in almost 90% of our examples, although observed topological differences do not always reach significance under the approximately unbiased (AU) test. Changes in tree topology after addition of a taxon rarely occur close to its attachment location, and are more frequently observed in more distant tree locations carrying low bootstrap support. To investigate whether instability is predictable, we hypothesize sources of instability and design summary statistics addressing these hypotheses. Using these summary statistics as input features for machine learning under random forests, we are able to predict instability and can identify the most influential features. In summary, it does not appear that a strict insertion-only online inference method will deliver globally optimal trees, although relaxing insertion strictness by allowing for a small number of final tree rearrangements or accepting slightly suboptimal solutions appears feasible.

摘要

在线系统发育推断方法可以将陆续到达的序列添加到推断出的系统发育树中,而无需从头重新计算整个树。目前已经存在一些在线方法的实现,但人们仍然担心额外的序列可能会改变原始分类单元集之间的拓扑关系。我们将系统发育树拓扑结构的这种变化称为推断树缺乏稳定性。在本文中,我们在最大似然框架下分析了1000个经验数据集上单分类单元添加的稳定性。我们发现,在近90%的示例中都出现了不稳定性,尽管在近似无偏(AU)检验下观察到的拓扑差异并不总是显著。添加一个分类单元后,系统发育树拓扑结构的变化很少发生在其附着位置附近,而在具有低自展支持的更远的树位置更常观察到。为了研究不稳定性是否可预测,我们假设了不稳定性的来源并设计了针对这些假设的汇总统计量。将这些汇总统计量用作随机森林下机器学习的输入特征,我们能够预测不稳定性并识别最具影响力的特征。总之,虽然通过允许少量最终树重排或接受略次优的解决方案来放宽插入严格性似乎是可行的,但严格的仅插入式在线推断方法似乎无法生成全局最优树。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dfeb/11809580/0ec528c4d4ec/syae059_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验