Suppr超能文献

使用EukPhylo v.1.0重新思考大规模系统发育基因组学,这是一个灵活的工具包,可实现系统发育信息指导的数据管理以及对多种真核生物谱系的分析。

Rethinking large-scale phylogenomics with EukPhylo v.1.0, a flexible toolkit to enable phylogeny-informed data curation and analyses of diverse eukaryotic lineages.

作者信息

Katz Laura A, Leleu Marie, Ani Godwin, Gawron Rebecca, Cote-L'Heureux Auden

机构信息

Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA.

Program in Organismic Biology and Evolution, University of Massachusetts Amherst, Amherst, Massachusetts, USA.

出版信息

mBio. 2025 Aug 27:e0177025. doi: 10.1128/mbio.01770-25.

Abstract

Eukaryotic diversity is largely microbial, with macroscopic lineages (plants, animals, and fungi) nesting among a plethora of diverse protists. Our understanding of the evolutionary relationships among eukaryotes is rapidly advancing through 'omics analyses, but phylogenomic analyses are challenging for microeukaryotes, particularly uncultivable lineages, as single-cell sequencing approaches generate a mixture of sequences from hosts, associated microbiomes, and contaminants. Moreover, many analyses of eukaryotic gene families and phylogenies rely on boutique data sets and methods that are challenging for other research groups to replicate. To address these challenges, we present EukPhylo v.1.0, a modular, user-friendly pipeline that enables effective data curation through phylogeny-informed contamination removal, estimation of homologous gene families (GFs), and generation of both multisequence alignments and gene trees. For the GF assignment, we provide the "Hook Database" of ~15,000 ancient GFs, which users can easily replace with a set of gene families of interest. We demonstrate the power of EukPhylo, including a suite of stand-alone utilities, through phylogenomic analyses of 500 conserved GFs sampled from 1,000 diverse species of eukaryotes, bacteria, and archaea. We show improvements in estimates of the eukaryotic tree of life, recovering clades that are well established in the literature, through successive rounds of curation using the EukPhylo contamination loop. The final trees corroborate numerous hypotheses in the literature (e.g., Opisthokonta, Rhizaria, Amoebozoa) while challenging others (e.g., CRuMs, Obazoa, Diaphoretickes). The flexibility and transparency of EukPhylo set new standards for curation of 'omics data for future studies.IMPORTANCEIlluminating the diversity of microbial lineages is essential for estimating the tree of life and characterizing principles of genome evolution. However, analyses of microbial eukaryotes (e.g., flagellates, amoebae) are complicated by both the paucity of reference genomes and the prevalence of contamination (e.g., by symbionts, microbiomes). EukPhylo v.1.0 enables taxon-rich analyses "on the fly" as users can choose optimal gene families for their focal taxa and then use replicable approaches to curate data in estimating both gene and species trees. With multiple entry points and curated data sets from up to 15,000 gene families from 1,000 taxa ready for use, EukPhylo provides a powerful launching point for researchers interested in the evolution of eukaryotes.

摘要

真核生物的多样性主要是微生物的,宏观谱系(植物、动物和真菌)嵌套在众多不同的原生生物之中。通过“组学”分析,我们对真核生物之间进化关系的理解正在迅速推进,但系统发育基因组学分析对微真核生物来说具有挑战性,尤其是那些无法培养的谱系,因为单细胞测序方法会产生来自宿主、相关微生物群落和污染物的混合序列。此外,许多对真核基因家族和系统发育的分析依赖于精品数据集和方法,其他研究团队很难复制。为应对这些挑战,我们推出了EukPhylo v.1.0,这是一个模块化、用户友好的流程,通过基于系统发育的污染去除、同源基因家族(GFs)的估计以及多序列比对和基因树的生成,实现有效的数据管理。对于GF分配,我们提供了一个包含约15000个古老GF的“钩子数据库”,用户可以轻松地用一组感兴趣的基因家族替换它。我们通过对从1000种不同的真核生物、细菌和古生菌中采样的500个保守GF进行系统发育基因组学分析,展示了EukPhylo的强大功能,包括一套独立的实用工具。我们通过使用EukPhylo污染循环进行连续几轮的数据管理,展示了在真核生物生命树估计方面的改进,恢复了文献中已确立的进化枝。最终的树证实了文献中的许多假设(如后鞭毛生物、根足虫、变形虫),同时也对其他假设提出了挑战(如CRuMs、Obazoa、Diaphoretickes)。EukPhylo的灵活性和透明度为未来研究的“组学”数据管理设定了新标准。

重要性

阐明微生物谱系的多样性对于估计生命树和表征基因组进化原理至关重要。然而,对微生物真核生物(如鞭毛虫、变形虫)的分析因参考基因组的匮乏和污染(如共生体、微生物群落)的普遍存在而变得复杂。EukPhylo v.1.0能够“即时”进行丰富分类单元的分析,因为用户可以为其重点分类单元选择最佳基因家族,然后使用可复制的方法在估计基因树和物种树时管理数据。有了多个入口点和来自1000个分类单元的多达15000个基因家族的管理数据集可供使用,EukPhylo为对真核生物进化感兴趣的研究人员提供了一个强大的起点。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验