Royal Botanic Gardens, Kew, Richmond, Surrey TW9 3AE, UK.
Centre for Research in Agricultural Genomics, Campus UAB, Edifici CRAG, Bellaterra Cerdanyola del Vallès, 08193 Barcelona, Spain.
Syst Biol. 2022 Feb 10;71(2):301-319. doi: 10.1093/sysbio/syab035.
The tree of life is the fundamental biological roadmap for navigating the evolution and properties of life on Earth, and yet remains largely unknown. Even angiosperms (flowering plants) are fraught with data gaps, despite their critical role in sustaining terrestrial life. Today, high-throughput sequencing promises to significantly deepen our understanding of evolutionary relationships. Here, we describe a comprehensive phylogenomic platform for exploring the angiosperm tree of life, comprising a set of open tools and data based on the 353 nuclear genes targeted by the universal Angiosperms353 sequence capture probes. The primary goals of this article are to (i) document our methods, (ii) describe our first data release, and (iii) present a novel open data portal, the Kew Tree of Life Explorer (https://treeoflife.kew.org). We aim to generate novel target sequence capture data for all genera of flowering plants, exploiting natural history collections such as herbarium specimens, and augment it with mined public data. Our first data release, described here, is the most extensive nuclear phylogenomic data set for angiosperms to date, comprising 3099 samples validated by DNA barcode and phylogenetic tests, representing all 64 orders, 404 families (96$%$) and 2333 genera (17$%$). A "first pass" angiosperm tree of life was inferred from the data, which totaled 824,878 sequences, 489,086,049 base pairs, and 532,260 alignment columns, for interactive presentation in the Kew Tree of Life Explorer. This species tree was generated using methods that were rigorous, yet tractable at our scale of operation. Despite limitations pertaining to taxon and gene sampling, gene recovery, models of sequence evolution and paralogy, the tree strongly supports existing taxonomy, while challenging numerous hypothesized relationships among orders and placing many genera for the first time. The validated data set, species tree and all intermediates are openly accessible via the Kew Tree of Life Explorer and will be updated as further data become available. This major milestone toward a complete tree of life for all flowering plant species opens doors to a highly integrated future for angiosperm phylogenomics through the systematic sequencing of standardized nuclear markers. Our approach has the potential to serve as a much-needed bridge between the growing movement to sequence the genomes of all life on Earth and the vast phylogenomic potential of the world's natural history collections. [Angiosperms; Angiosperms353; genomics; herbariomics; museomics; nuclear phylogenomics; open access; target sequence capture; tree of life.].
生命之树是探索地球上生命进化和特性的基本生物学路线图,但它仍然很大程度上不为人知。即使是被子植物(开花植物)也存在数据空白,尽管它们在维持陆地生命方面起着至关重要的作用。如今,高通量测序有望极大地加深我们对进化关系的理解。在这里,我们描述了一个用于探索被子植物生命之树的综合系统基因组学平台,该平台由一组基于通用被子植物 353 序列捕获探针靶向的 353 个核基因的开放工具和数据组成。本文的主要目标是:(i)记录我们的方法,(ii)描述我们的第一个数据发布,以及 (iii)展示一个新的开放数据门户,即邱园生命之树浏览器(https://treeoflife.kew.org)。我们旨在为所有开花植物属生成新的目标序列捕获数据,利用植物标本等自然历史收藏,并利用挖掘的公共数据对其进行补充。我们在这里描述的第一个数据发布是迄今为止最广泛的被子植物核系统基因组数据集,包括通过 DNA 条形码和系统发育测试验证的 3099 个样本,代表了所有 64 个目、404 个科(96%)和 2333 个属(17%)。从总共 824878 个序列、489086049 个碱基和 532260 个比对列的数据中推断出了一个“首次通过”的被子植物系统发育树,用于在邱园生命之树浏览器中进行交互式呈现。该种系发生树是使用严格但在我们的操作规模上可行的方法生成的。尽管存在与分类群和基因采样、基因回收率、序列进化和旁系同源模型相关的限制,但该树强烈支持现有的分类学,同时对目之间的许多假定关系提出了挑战,并首次将许多属归入其中。经过验证的数据集、种系发生树和所有中间产物均可通过邱园生命之树浏览器公开访问,并将随着更多数据的可用而进行更新。这一朝着所有开花植物物种完整生命之树迈进的重要里程碑,通过对标准化核标记的系统测序,为被子植物系统基因组学的高度综合未来开辟了道路。我们的方法有可能成为将地球上所有生命的基因组测序运动与世界自然历史收藏的巨大系统基因组潜力联系起来的急需桥梁。[被子植物;被子植物 353;基因组学;植物标本学;博物馆学;核系统基因组学;开放获取;目标序列捕获;生命之树。]