• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

FASTRAL:提升系统发育基因组学分析的可扩展性。

FASTRAL: improving scalability of phylogenomic analysis.

作者信息

Dibaeinia Payam, Tabe-Bordbar Shayan, Warnow Tandy

机构信息

Department of Computer Science, University of Illinois, Urbana, IL 61801, USA.

出版信息

Bioinformatics. 2021 Aug 25;37(16):2317-2324. doi: 10.1093/bioinformatics/btab093.

DOI:10.1093/bioinformatics/btab093
PMID:33576396
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8388037/
Abstract

MOTIVATION

ASTRAL is the current leading method for species tree estimation from phylogenomic datasets (i.e. hundreds to thousands of genes) that addresses gene tree discord resulting from incomplete lineage sorting (ILS). ASTRAL is statistically consistent under the multi-locus coalescent model (MSC), runs in polynomial time, and is able to run on large datasets. Key to ASTRAL's algorithm is the use of dynamic programming to find an optimal solution to the MQSST (maximum quartet support supertree) within a constraint space that it computes from the input. Yet, ASTRAL can fail to complete within reasonable timeframes on large datasets with many genes and species, because in these cases the constraint space it computes is too large.

RESULTS

Here, we introduce FASTRAL, a phylogenomic estimation method. FASTRAL is based on ASTRAL, but uses a different technique for constructing the constraint space. The technique we use to define the constraint space maintains statistical consistency and is polynomial time; thus we prove that FASTRAL is a polynomial time algorithm that is statistically consistent under the MSC. Our performance study on both biological and simulated datasets demonstrates that FASTRAL matches or improves on ASTRAL with respect to species tree topology accuracy (and under high ILS conditions it is statistically significantly more accurate), while being dramatically faster-especially on datasets with large numbers of genes and high ILS-due to using a significantly smaller constraint space.

AVAILABILITYAND IMPLEMENTATION

FASTRAL is available in open-source form at https://github.com/PayamDiba/FASTRAL.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

ASTRAL是目前从系统发育基因组数据集(即数百到数千个基因)估计物种树的领先方法,该方法可解决由不完全谱系分选(ILS)导致的基因树不一致问题。在多位点合并模型(MSC)下,ASTRAL具有统计一致性,运行时间为多项式时间,并且能够处理大型数据集。ASTRAL算法的关键是使用动态规划在其根据输入计算出的约束空间内找到MQSST(最大四重奏支持超树)的最优解。然而,在具有许多基因和物种的大型数据集上,ASTRAL可能无法在合理的时间范围内完成,因为在这些情况下,它计算出的约束空间太大。

结果

在此,我们介绍了FASTRAL,一种系统发育基因组估计方法。FASTRAL基于ASTRAL,但使用了不同的技术来构建约束空间。我们用于定义约束空间的技术保持了统计一致性且为多项式时间;因此,我们证明了FASTRAL是一种在MSC下具有统计一致性的多项式时间算法。我们对生物数据集和模拟数据集的性能研究表明,在物种树拓扑准确性方面,FASTRAL与ASTRAL相当或有所提高(在高ILS条件下,其统计准确性显著更高),同时速度大幅提升——特别是在具有大量基因和高ILS的数据集上——这是因为使用了明显更小的约束空间。

可用性和实现

FASTRAL以开源形式可在https://github.com/PayamDiba/FASTRAL获取。

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/990ad28d4c9f/btab093f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/d2e5bedc62d1/btab093f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/43d9ce746bdb/btab093f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/58a87cf710ef/btab093f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/9a6d182b1912/btab093f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/990ad28d4c9f/btab093f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/d2e5bedc62d1/btab093f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/43d9ce746bdb/btab093f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/58a87cf710ef/btab093f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/9a6d182b1912/btab093f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6382/8388037/990ad28d4c9f/btab093f5.jpg

相似文献

1
FASTRAL: improving scalability of phylogenomic analysis.FASTRAL:提升系统发育基因组学分析的可扩展性。
Bioinformatics. 2021 Aug 25;37(16):2317-2324. doi: 10.1093/bioinformatics/btab093.
2
ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes.ASTRAL-II:基于合并的数百个分类群和数千个基因的种系发生树估计。
Bioinformatics. 2015 Jun 15;31(12):i44-52. doi: 10.1093/bioinformatics/btv234.
3
Scalable Species Tree Inference with External Constraints.可扩展的带外部约束的种系发生树推断。
J Comput Biol. 2022 Jul;29(7):664-678. doi: 10.1089/cmb.2021.0543. Epub 2022 Feb 21.
4
ASTRAL: genome-scale coalescent-based species tree estimation.ASTRAL:基于基因组规模合并的物种树估计。
Bioinformatics. 2014 Sep 1;30(17):i541-8. doi: 10.1093/bioinformatics/btu462.
5
wQFM: highly accurate genome-scale species tree estimation from weighted quartets.wQFM:基于加权四重奏的高精度基因组规模物种树估计
Bioinformatics. 2021 Nov 5;37(21):3734-3743. doi: 10.1093/bioinformatics/btab428.
6
STELAR: a statistically consistent coalescent-based species tree estimation method by maximizing triplet consistency.STELAR:一种基于最大三重一致性的统计一致的合并物种树估计方法。
BMC Genomics. 2020 Feb 10;21(1):136. doi: 10.1186/s12864-020-6519-y.
7
A comparative study of SVDquartets and other coalescent-based species tree estimation methods.SVDquartets与其他基于溯祖理论的物种树估计方法的比较研究。
BMC Genomics. 2015;16 Suppl 10(Suppl 10):S2. doi: 10.1186/1471-2164-16-S10-S2. Epub 2015 Oct 2.
8
ASTRID: Accurate Species TRees from Internode Distances.ASTRID:基于节间距离的精确物种树
BMC Genomics. 2015;16 Suppl 10(Suppl 10):S3. doi: 10.1186/1471-2164-16-S10-S3. Epub 2015 Oct 2.
9
ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees.ASTRAL-III:从部分解析的基因树重建多项式时间种系发生树。
BMC Bioinformatics. 2018 May 8;19(Suppl 6):153. doi: 10.1186/s12859-018-2129-y.
10
Multi-allele species reconstruction using ASTRAL.使用 ASTRAL 进行多等位基因物种重建。
Mol Phylogenet Evol. 2019 Jan;130:286-296. doi: 10.1016/j.ympev.2018.10.033. Epub 2018 Oct 26.

引用本文的文献

1
A Guide to Phylogenomic Inference.系统发育基因组推断指南。
Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11.
2
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem.多洛 - CDP:一种用于分支约束大型多洛简约问题的多项式时间算法。
Algorithms Mol Biol. 2024 Jan 8;19(1):2. doi: 10.1186/s13015-023-00249-9.
3
Quartets enable statistically consistent estimation of cell lineage trees under an unbiased error and missingness model.四重奏法能够在无偏误差和缺失模型下对细胞谱系树进行统计上一致的估计。
Algorithms Mol Biol. 2023 Dec 1;18(1):19. doi: 10.1186/s13015-023-00248-w.
4
Weighted ASTRID: fast and accurate species trees from weighted internode distances.加权ASTRID:基于加权节间距离的快速准确物种树构建方法
Algorithms Mol Biol. 2023 Jul 19;18(1):6. doi: 10.1186/s13015-023-00230-6.
5
Improving quartet graph construction for scalable and accurate species tree estimation from gene trees.改进四重图构建,以实现从基因树到可扩展和准确的种系发生树估计。
Genome Res. 2023 Jul;33(7):1042-1052. doi: 10.1101/gr.277629.122. Epub 2023 May 17.
6
Recent progress on methods for estimating and updating large phylogenies.关于估计和更新大型系统发育树的方法的最新进展。
Philos Trans R Soc Lond B Biol Sci. 2022 Oct 10;377(1861):20210244. doi: 10.1098/rstb.2021.0244. Epub 2022 Aug 22.
7
Using all Gene Families Vastly Expands Data Available for Phylogenomic Inference.利用所有基因家族极大地扩展了用于系统发育基因组推断的数据。
Mol Biol Evol. 2022 Jun 2;39(6). doi: 10.1093/molbev/msac112.
8
Phylogenomic resolution of the root of Panpulmonata, a hyperdiverse radiation of gastropods: new insight into the evolution of air breathing.系统发生基因组解析 Panpulmonata 的根,腹足类动物的一个超多样化辐射:空气呼吸进化的新见解。
Proc Biol Sci. 2022 Apr 13;289(1972):20211855. doi: 10.1098/rspb.2021.1855. Epub 2022 Apr 6.