Suppr超能文献

FASTRAL:提升系统发育基因组学分析的可扩展性。

FASTRAL: improving scalability of phylogenomic analysis.

作者信息

Dibaeinia Payam, Tabe-Bordbar Shayan, Warnow Tandy

机构信息

Department of Computer Science, University of Illinois, Urbana, IL 61801, USA.

出版信息

Bioinformatics. 2021 Aug 25;37(16):2317-2324. doi: 10.1093/bioinformatics/btab093.

Abstract

MOTIVATION

ASTRAL is the current leading method for species tree estimation from phylogenomic datasets (i.e. hundreds to thousands of genes) that addresses gene tree discord resulting from incomplete lineage sorting (ILS). ASTRAL is statistically consistent under the multi-locus coalescent model (MSC), runs in polynomial time, and is able to run on large datasets. Key to ASTRAL's algorithm is the use of dynamic programming to find an optimal solution to the MQSST (maximum quartet support supertree) within a constraint space that it computes from the input. Yet, ASTRAL can fail to complete within reasonable timeframes on large datasets with many genes and species, because in these cases the constraint space it computes is too large.

RESULTS

Here, we introduce FASTRAL, a phylogenomic estimation method. FASTRAL is based on ASTRAL, but uses a different technique for constructing the constraint space. The technique we use to define the constraint space maintains statistical consistency and is polynomial time; thus we prove that FASTRAL is a polynomial time algorithm that is statistically consistent under the MSC. Our performance study on both biological and simulated datasets demonstrates that FASTRAL matches or improves on ASTRAL with respect to species tree topology accuracy (and under high ILS conditions it is statistically significantly more accurate), while being dramatically faster-especially on datasets with large numbers of genes and high ILS-due to using a significantly smaller constraint space.

AVAILABILITYAND IMPLEMENTATION

FASTRAL is available in open-source form at https://github.com/PayamDiba/FASTRAL.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

ASTRAL是目前从系统发育基因组数据集(即数百到数千个基因)估计物种树的领先方法,该方法可解决由不完全谱系分选(ILS)导致的基因树不一致问题。在多位点合并模型(MSC)下,ASTRAL具有统计一致性,运行时间为多项式时间,并且能够处理大型数据集。ASTRAL算法的关键是使用动态规划在其根据输入计算出的约束空间内找到MQSST(最大四重奏支持超树)的最优解。然而,在具有许多基因和物种的大型数据集上,ASTRAL可能无法在合理的时间范围内完成,因为在这些情况下,它计算出的约束空间太大。

结果

在此,我们介绍了FASTRAL,一种系统发育基因组估计方法。FASTRAL基于ASTRAL,但使用了不同的技术来构建约束空间。我们用于定义约束空间的技术保持了统计一致性且为多项式时间;因此,我们证明了FASTRAL是一种在MSC下具有统计一致性的多项式时间算法。我们对生物数据集和模拟数据集的性能研究表明,在物种树拓扑准确性方面,FASTRAL与ASTRAL相当或有所提高(在高ILS条件下,其统计准确性显著更高),同时速度大幅提升——特别是在具有大量基因和高ILS的数据集上——这是因为使用了明显更小的约束空间。

可用性和实现

FASTRAL以开源形式可在https://github.com/PayamDiba/FASTRAL获取。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/d2e5bedc62d1/btab093f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验