Suppr超能文献

基于 RADseq 测序的加利福尼亚白橡树(栎属栎亚属)参考映射和从头组装短读序列数据的系统发育基因组推断。

Phylogenomic inferences from reference-mapped and de novo assembled short-read sequence data using RADseq sequencing of California white oaks (Quercus section Quercus).

机构信息

a Institute of Genomics and Proteomics, University of California, Los Angeles, CA 90095, USA.

b The Morton Arboretum, 4100 Illinois Route 53, Lisle, IL 60532-1293, USA.

出版信息

Genome. 2017 Sep;60(9):743-755. doi: 10.1139/gen-2016-0202. Epub 2017 Mar 29.

Abstract

The emergence of next generation sequencing has increased by several orders of magnitude the amount of data available for phylogenetics. Reduced representation approaches, such as restriction-sited associated DNA sequencing (RADseq), have proven useful for phylogenetic studies of non-model species at a wide range of phylogenetic depths. However, analysis of these datasets is not uniform and we know little about the potential benefits and drawbacks of de novo assembly versus assembly by mapping to a reference genome. Using RADseq data for 83 oak samples representing 16 taxa, we identified variants via three pipelines: mapping sequence reads to a recently published draft genome of Quercus lobata, and de novo assembly under two sets of locus filters. For each pipeline, we inferred the maximum likelihood phylogeny. All pipelines produced similar trees, with minor shifts in relationships within well-supported clades, despite the fact that they yielded different numbers of loci (68 000 - 111 000 loci) and different degrees of overlap with the reference genome. We conclude that both the reference-aligned and de novo assembly pipelines yield reliable results, and that advantages and disadvantages of these approaches pertain mainly to downstream uses of RADseq data, not to phylogenetic inference per se.

摘要

下一代测序技术的出现使得用于系统发育学的可用数据量增加了几个数量级。 例如限制位点相关 DNA 测序(RADseq)等简化表示方法,已被证明对于广泛的系统发育深度的非模型物种的系统发育研究非常有用。 但是,这些数据集的分析并不统一,我们对从头组装与映射到参考基因组组装的潜在优缺点知之甚少。 使用代表 16 个分类单元的 83 个栎属样本的 RADseq 数据,我们通过三种途径识别变体:将序列读数映射到最近发表的 Quercus lobata 草案基因组,以及在两组定位器过滤器下进行从头组装。 对于每个管道,我们推断了最大似然系统发育。 尽管它们产生了不同数量的基因座(68000-111000 个基因座),并且与参考基因组的重叠程度不同,但所有管道都产生了相似的树,在支持良好的分支内的关系略有变化。 我们得出的结论是,参考对齐和从头组装的管道都能产生可靠的结果,这些方法的优缺点主要涉及 RADseq 数据的下游用途,而不是系统发育推断本身。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验