Suppr超能文献

RAD测序系统发育学中缺失数据的误解:以开花植物为例进行深度解析

Misconceptions on Missing Data in RAD-seq Phylogenetics with a Deep-scale Example from Flowering Plants.

作者信息

Eaton Deren A R, Spriggs Elizabeth L, Park Brian, Donoghue Michael J

机构信息

Department of Ecology and Evolutionary Biology, Yale University, PO Box 208106, New Haven, CT, 06520, USA.

出版信息

Syst Biol. 2017 May 1;66(3):399-412. doi: 10.1093/sysbio/syw092.

Abstract

Restriction-site associated DNA (RAD) sequencing and related methods rely on the conservation of enzyme recognition sites to isolate homologous DNA fragments for sequencing, with the consequence that mutations disrupting these sites lead to missing information. There is thus a clear expectation for how missing data should be distributed, with fewer loci recovered between more distantly related samples. This observation has led to a related expectation: that RAD-seq data are insufficiently informative for resolving deeper scale phylogenetic relationships. Here we investigate the relationship between missing information among samples at the tips of a tree and information at edges within it. We re-analyze and review the distribution of missing data across ten RAD-seq data sets and carry out simulations to determine expected patterns of missing information. We also present new empirical results for the angiosperm clade Viburnum (Adoxaceae, with a crown age >50 Ma) for which we examine phylogenetic information at different depths in the tree and with varied sequencing effort. The total number of loci, the proportion that are shared, and phylogenetic informativeness varied dramatically across the examined RAD-seq data sets. Insufficient or uneven sequencing coverage accounted for similar proportions of missing data as dropout from mutation-disruption. Simulations reveal that mutation-disruption, which results in phylogenetically distributed missing data, can be distinguished from the more stochastic patterns of missing data caused by low sequencing coverage. In Viburnum, doubling sequencing coverage nearly doubled the number of parsimony informative sites, and increased by >10X the number of loci with data shared across >40 taxa. Our analysis leads to a set of practical recommendations for maximizing phylogenetic information in RAD-seq studies. [hierarchical redundancy; phylogenetic informativeness; quartet informativeness; Restriction-site associated DNA (RAD) sequencing; sequencing coverage; Viburnum.].

摘要

限制性内切酶位点关联DNA(RAD)测序及相关方法依赖于酶识别位点的保守性来分离同源DNA片段进行测序,结果是破坏这些位点的突变会导致信息缺失。因此,对于缺失数据应如何分布有明确的预期,即亲缘关系越远的样本之间回收的位点越少。这一观察结果引发了另一个相关预期:RAD测序数据对于解析更深层次的系统发育关系信息不足。在这里,我们研究了树末端样本间缺失信息与树内部边信息之间的关系。我们重新分析并回顾了十个RAD测序数据集的缺失数据分布,并进行模拟以确定缺失信息的预期模式。我们还展示了被子植物荚蒾属(五福花科,冠龄>50 Ma)的新实证结果,我们在该属中研究了树中不同深度以及不同测序工作量下的系统发育信息。在所研究的RAD测序数据集中,位点总数、共享比例和系统发育信息含量差异极大。测序覆盖不足或不均所导致的缺失数据比例与因突变破坏而导致的缺失比例相近。模拟结果表明,由突变破坏导致的、在系统发育上分布的缺失数据,可与由低测序覆盖导致的更随机的缺失数据模式区分开来。在荚蒾属中,将测序覆盖度翻倍几乎使简约信息位点数量翻倍,并使在40多个分类群中共享数据的位点数量增加了10倍以上。我们的分析得出了一套在RAD测序研究中最大化系统发育信息的实用建议。[层次冗余;系统发育信息含量;四重奏信息含量;限制性内切酶位点关联DNA(RAD)测序;测序覆盖度;荚蒾属。]

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验