GeneRax：一种在基因复制、转移和丢失情况下基于最大似然法的物种树感知的基因家族树推断工具。

GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss.

机构信息

Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany.

出版信息

Mol Biol Evol. 2020 Sep 1;37(9):2763-2774. doi: 10.1093/molbev/msaa141.

DOI:10.1093/molbev/msaa141

PMID:32502238

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8312565/

Abstract

Inferring phylogenetic trees for individual homologous gene families is difficult because alignments are often too short, and thus contain insufficient signal, while substitution models inevitably fail to capture the complexity of the evolutionary processes. To overcome these challenges, species-tree-aware methods also leverage information from a putative species tree. However, only few methods are available that implement a full likelihood framework or account for horizontal gene transfers. Furthermore, these methods often require expensive data preprocessing (e.g., computing bootstrap trees) and rely on approximations and heuristics that limit the degree of tree space exploration. Here, we present GeneRax, the first maximum likelihood species-tree-aware phylogenetic inference software. It simultaneously accounts for substitutions at the sequence level as well as gene level events, such as duplication, transfer, and loss relying on established maximum likelihood optimization algorithms. GeneRax can infer rooted phylogenetic trees for multiple gene families, directly from the per-gene sequence alignments and a rooted, yet undated, species tree. We show that compared with competing tools, on simulated data GeneRax infers trees that are the closest to the true tree in 90% of the simulations in terms of relative Robinson-Foulds distance. On empirical data sets, GeneRax is the fastest among all tested methods when starting from aligned sequences, and it infers trees with the highest likelihood score, based on our model. GeneRax completed tree inferences and reconciliations for 1,099 Cyanobacteria families in 8 min on 512 CPU cores. Thus, its parallelization scheme enables large-scale analyses. GeneRax is available under GNU GPL at https://github.com/BenoitMorel/GeneRax (last accessed June 17, 2020).

摘要

推断单个同源基因家族的系统发育树是困难的，因为比对通常太短，因此包含的信号不足，而替代模型不可避免地无法捕捉到进化过程的复杂性。为了克服这些挑战，种系树感知方法还利用了假定种系树的信息。然而，只有少数方法可用，这些方法实现了完整的似然框架或考虑了水平基因转移。此外，这些方法通常需要昂贵的数据预处理（例如，计算引导树），并且依赖于限制树空间探索程度的近似值和启发式方法。在这里，我们介绍了 GeneRax，这是第一个最大似然种系树感知系统发育推断软件。它同时考虑了序列水平和基因水平的替代，例如基于已建立的最大似然优化算法的复制、转移和丢失。GeneRax 可以从每个基因的序列比对和一个有根但未标记的种系树直接推断出多个基因家族的有根系统发育树。我们表明，与竞争工具相比，在模拟数据中，在 90%的模拟中，GeneRax 推断的树在相对罗宾逊-福尔兹距离方面最接近真实树。在实际数据集上，从对齐序列开始时，GeneRax 是所有测试方法中最快的，并且根据我们的模型推断出具有最高似然评分的树。GeneRax 在 512 个 CPU 内核上用 8 分钟完成了 1099 个蓝藻家族的树推断和协调。因此，其并行化方案支持大规模分析。GeneRax 在 https://github.com/BenoitMorel/GeneRax 下根据 GNU GPL 提供（最后访问时间为 2020 年 6 月 17 日）。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6159/8312565/fdb6c856186b/msaa141f1.jpg

相似文献

GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss.GeneRax：一种在基因复制、转移和丢失情况下基于最大似然法的物种树感知的基因家族树推断工具。

Mol Biol Evol. 2020 Sep 1;37(9):2763-2774. doi: 10.1093/molbev/msaa141.

SpeciesRax: A Tool for Maximum Likelihood Species Tree Inference from Gene Family Trees under Duplication, Transfer, and Loss.SpeciesRax：一种用于在基因家族树中进行复制、转移和丢失的最大似然种系发生树推断的工具。

Mol Biol Evol. 2022 Feb 3;39(2). doi: 10.1093/molbev/msab365.

AleRax: a tool for gene and species tree co-estimation and reconciliation under a probabilistic model of gene duplication, transfer, and loss.AleRax：一种在基因复制、转移和丢失的概率模型下，进行基因和物种树共同估计和协调的工具。

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae162.

Parameter Estimation and Species Tree Rooting Using ALE and GeneRax.使用 ALE 和 GeneRax 进行参数估计和种系发生树的根系重建。

Genome Biol Evol. 2023 Jul 3;15(7). doi: 10.1093/gbe/evad134.

Invariant transformers of Robinson and Foulds distance matrices for Convolutional Neural Network.不变的 Robinson 和 Foulds 距离矩阵变换用于卷积神经网络。

J Bioinform Comput Biol. 2022 Aug;20(4):2250012. doi: 10.1142/S0219720022500123. Epub 2022 Jul 6.

GATC: a genetic algorithm for gene tree construction under the Duplication-Transfer-Loss model of evolution.GATC：一种在进化的复制-转移-丢失模型下构建基因树的遗传算法。

BMC Genomics. 2018 May 9;19(Suppl 2):102. doi: 10.1186/s12864-018-4455-x.

On the impact of uncertain gene tree rooting on duplication-transfer-loss reconciliation.关于基因树无根状态对重复-转移-丢失事件整合的影响。

BMC Bioinformatics. 2018 Aug 13;19(Suppl 9):290. doi: 10.1186/s12859-018-2269-0.

Efficient exploration of the space of reconciled gene trees.高效探索协调基因树空间。

Syst Biol. 2013 Nov;62(6):901-12. doi: 10.1093/sysbio/syt054. Epub 2013 Aug 6.

Efficient algorithms for the reconciliation problem with gene duplication, horizontal transfer and loss.具有基因复制、水平转移和缺失的协调问题的高效算法。

Bioinformatics. 2012 Jun 15;28(12):i283-91. doi: 10.1093/bioinformatics/bts225.

Exact Algorithms for Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.精确算法在非二进制基因树上的复制-转移-缺失协调。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1077-1090. doi: 10.1109/TCBB.2017.2710342. Epub 2017 Jun 1.

引用本文的文献

Insect Phylogenomics: From Experiment Planning to Post-phylogenetic Analyses.昆虫系统发育基因组学：从实验规划到系统发育后分析

Methods Mol Biol. 2025;2935:211-235. doi: 10.1007/978-1-0716-4583-3_9.

The genomic origin of the unique chaetognath body plan.独特箭虫身体结构的基因组起源。

Nature. 2025 Aug 13. doi: 10.1038/s41586-025-09403-2.

Evolution of iGluR ligand specificity, polyamine regulation, and ion selectivity inferred from a placozoan epsilon receptor.从扁盘动物ε受体推断离子型谷氨酸受体配体特异性、多胺调节及离子选择性的进化

Commun Biol. 2025 Jul 3;8(1):994. doi: 10.1038/s42003-025-08402-3.

Extensive data mining uncovers novel diversity among members of the rare biosphere within the Thermoplasmatota.广泛的数据挖掘揭示了嗜热栖热菌门稀有生物圈成员之间新的多样性。

Microbiome. 2025 Jul 1;13(1):155. doi: 10.1186/s40168-025-02140-8.

A single-cell atlas of the bobtail squid visual and nervous system highlights molecular principles of convergent evolution.一种短尾乌贼视觉与神经系统的单细胞图谱凸显了趋同进化的分子原理。

Nat Ecol Evol. 2025 Jun 6. doi: 10.1038/s41559-025-02720-9.

Convergent expansions of keystone gene families drive metabolic innovation in Saccharomycotina yeasts.关键基因家族的趋同扩张驱动了酵母亚门酵母的代谢创新。

Proc Natl Acad Sci U S A. 2025 Jun 10;122(23):e2500165122. doi: 10.1073/pnas.2500165122. Epub 2025 Jun 3.

A BRASSINOSTEROID INSENSISTIVE 1 receptor kinase ortholog is required for sex determination in Ceratopteris richardii.一种油菜素类固醇不敏感1受体激酶直系同源物是里氏水蓑衣性别决定所必需的。

Plant Cell. 2025 May 9;37(5). doi: 10.1093/plcell/koaf058.

Multiple transitions to high l-DOPA 4,5-dioxygenase activity reveal molecular pathways to convergent betalain pigmentation in Caryophyllales.多次向高L-多巴4,5-双加氧酶活性的转变揭示了石竹目植物中向趋同甜菜色素沉着的分子途径。

New Phytol. 2025 Jul;247(1):341-357. doi: 10.1111/nph.70177. Epub 2025 May 5.

Hematophagy Generates a Convergent Genomic Signature in Mosquitoes and Sandflies.吸血在蚊子和白蛉中产生趋同的基因组特征。

Genome Biol Evol. 2025 Mar 6;17(3). doi: 10.1093/gbe/evaf044.

Tandem duplication of serpin genes yields functional variation and snake venom inhibitors.丝氨酸蛋白酶抑制剂基因的串联重复产生功能变异和蛇毒抑制剂。

bioRxiv. 2025 Jan 10:2025.01.07.631777. doi: 10.1101/2025.01.07.631777.

本文引用的文献

Treerecs: an integrated phylogenetic tool, from sequences to reconciliations.树猴：一种整合的系统发生工具，从序列到系统发育分析。

Bioinformatics. 2020 Sep 15;36(18):4822-4824. doi: 10.1093/bioinformatics/btaa615.

Vision using multiple distinct rod opsins in deep-sea fishes.深海鱼类中使用多种不同的视杆蛋白进行视觉。

Science. 2019 May 10;364(6440):588-592. doi: 10.1126/science.aav4632.

RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference.RAxML-NG：用于最大似然系统发育推断的快速、可扩展和用户友好的工具。

Bioinformatics. 2019 Nov 1;35(21):4453-4455. doi: 10.1093/bioinformatics/btz305.

ParGenes: a tool for massively parallel model selection and phylogenetic tree inference on thousands of genes.ParGenes：一个用于在数千个基因上进行大规模并行模型选择和系统发育树推断的工具。

Bioinformatics. 2019 May 15;35(10):1771-1773. doi: 10.1093/bioinformatics/bty839.

Bioconda: sustainable and comprehensive software distribution for the life sciences.生物conda：面向生命科学的可持续且全面的软件发行平台。

Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7.

RecPhyloXML: a format for reconciled gene trees.RecPhyloXML：一种用于协调基因树的格式。

Bioinformatics. 2018 Nov 1;34(21):3646-3652. doi: 10.1093/bioinformatics/bty389.

Ensembl 2018.Ensembl 2018.

Nucleic Acids Res. 2018 Jan 4;46(D1):D754-D761. doi: 10.1093/nar/gkx1098.

Inferring incomplete lineage sorting, duplications, transfers and losses with reconciliations.通过比对推断不完全谱系分选、重复、转移和丢失情况。

J Theor Biol. 2017 Nov 7;432:1-13. doi: 10.1016/j.jtbi.2017.08.008. Epub 2017 Aug 9.

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.系统发育树的置信区间：一种使用自展法的方法。

Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.

Efficient Gene Tree Correction Guided by Genome Evolution.基于基因组进化指导的高效基因树校正

PLoS One. 2016 Aug 11;11(8):e0159559. doi: 10.1371/journal.pone.0159559. eCollection 2016.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

GeneRax：一种在基因复制、转移和丢失情况下基于最大似然法的物种树感知的基因家族树推断工具。

GeneRax: A Tool for Species-Tree-Aware Maximum Likelihood-Based Gene Family Tree Inference under Gene Duplication, Transfer, and Loss.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献