Suppr超能文献

基于机器学习的系统发育自举值预测。

Predicting Phylogenetic Bootstrap Values via Machine Learning.

机构信息

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology - Hellas, Heraklion, Crete, Greece.

出版信息

Mol Biol Evol. 2024 Oct 4;41(10). doi: 10.1093/molbev/msae215.

Abstract

Estimating the statistical robustness of the inferred tree(s) constitutes an integral part of most phylogenetic analyses. Commonly, one computes and assigns a branch support value to each inner branch of the inferred phylogeny. The still most widely used method for calculating branch support on trees inferred under maximum likelihood (ML) is the Standard, nonparametric Felsenstein bootstrap support (SBS). Due to the high computational cost of the SBS, a plethora of methods has been developed to approximate it, for instance, via the rapid bootstrap (RB) algorithm. There have also been attempts to devise faster, alternative support measures, such as the SH-aLRT (Shimodaira-Hasegawa-like approximate likelihood ratio test) or the UltraFast bootstrap 2 (UFBoot2) method. Those faster alternatives exhibit some limitations, such as the need to assess model violations (UFBoot2) or unstable behavior in the low support interval range (SH-aLRT). Here, we present the educated bootstrap guesser (EBG), a machine learning-based tool that predicts SBS branch support values for a given input phylogeny. EBG is on average 9.4 (σ=5.5) times faster than UFBoot2. EBG-based SBS estimates exhibit a median absolute error of 5 when predicting SBS values between 0 and 100. Furthermore, EBG also provides uncertainty measures for all per-branch SBS predictions and thereby allows for a more rigorous and careful interpretation. EBG can, for instance, predict SBS support values on a phylogeny comprising 1,654 SARS-CoV2 genome sequences within 3 h on a mid-class laptop. EBG is available under GNU GPL3.

摘要

估计推断出的树(s)的统计稳健性是大多数系统发育分析的一个组成部分。通常,人们会计算并为推断出的系统发育树的每个内部分支分配一个分支支持值。在最大似然(ML)推断下计算分支支持最常用的方法仍然是非参数 Felsenstein 自举支持(SBS)。由于 SBS 的计算成本很高,因此已经开发了许多方法来对其进行近似,例如通过快速自举(RB)算法。此外,还尝试设计更快的替代支持措施,例如 SH-aLRT(Shimodaira-Hasegawa 似然比检验)或 UltraFast bootstrap 2(UFBoot2)方法。这些更快的替代方法存在一些局限性,例如需要评估模型违反情况(UFBoot2)或在低支持间隔范围内的不稳定行为(SH-aLRT)。在这里,我们提出了基于机器学习的工具 educated bootstrap guesser(EBG),它可以预测给定输入系统发育的 SBS 分支支持值。EBG 的速度比 UFBoot2 平均快 9.4 倍(σ=5.5)。EBG 基于 SBS 的估计在预测 0 到 100 之间的 SBS 值时,中位数绝对误差为 5。此外,EBG 还为所有分支 SBS 预测提供不确定性度量,从而可以进行更严格和仔细的解释。例如,EBG 可以在中等笔记本电脑上在 3 小时内预测包含 1654 个 SARS-CoV2 基因组序列的系统发育的 SBS 支持值。EBG 可在 GNU GPL3 下使用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f2f9/11523138/3c463248bb0d/msae215f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验