Suppr超能文献

RAD测序数据的系统发育分析:研究基因谱系冲突对拼接数据的分析影响。

Phylogenetic analysis of RAD-seq data: examining the influence of gene genealogy conflict on analysis of concatenated data.

作者信息

Rivers David M, Darwell Clive T, Althoff David M

机构信息

Department of Biology, Syracuse University, 107 College Place, Syracuse, NY, 13244, USA.

出版信息

Cladistics. 2016 Dec;32(6):672-681. doi: 10.1111/cla.12149. Epub 2016 Jan 8.

Abstract

One of the major issues in phylogenetic analysis is that gene genealogies from different gene regions may not reflect the true species tree or history of speciation. This has led to considerable debate about whether concatenation of loci is the best approach for phylogenetic analysis. The application of Next-generation sequencing techniques such as RAD-seq generates thousands of relatively short sequence reads from across the genomes of the sampled taxa. These data sets are typically concatenated for phylogenetic analysis leading to data sets that contain millions of base pairs per taxon. The influence of gene region conflict among so many loci in determining the phylogenetic relationships among taxa is unclear. We simulated RAD-seq data by sampling 100 and 500 base pairs from alignments of over 6000 coding regions that each produce one of three highly supported alternative phylogenies of seven species of Drosophila. We conducted phylogenetic analyses on different sets of these regions to vary the sampling of loci with alternative gene trees to examine the effect on detecting the species tree. Irrespective of sequence length sampled per region and which subset of regions was used, phylogenetic analyses of the concatenated data always recovered the species tree. The results suggest that concatenated alignments of Next-generation data that consist of many short sequences are robust to gene tree/species tree conflict when the goal is to determine the phylogenetic relationships among taxa.

摘要

系统发育分析中的一个主要问题是,来自不同基因区域的基因谱系可能无法反映真正的物种树或物种形成历史。这引发了关于基因座串联是否是系统发育分析的最佳方法的大量争论。诸如RAD-seq等新一代测序技术的应用,从抽样分类群的基因组中产生了数千条相对较短的序列读数。这些数据集通常被串联起来用于系统发育分析,从而得到每个分类群包含数百万碱基对的数据集。如此众多的基因座之间的基因区域冲突对确定分类群之间的系统发育关系的影响尚不清楚。我们通过从6000多个编码区域的比对中抽样100和500个碱基对来模拟RAD-seq数据,这些编码区域各自产生了七种果蝇的三种高度支持的替代系统发育树之一。我们对这些区域的不同集合进行系统发育分析,以改变具有替代基因树的基因座抽样,从而检验对检测物种树的影响。无论每个区域抽样的序列长度如何,也无论使用哪些区域子集,串联数据的系统发育分析总是能恢复物种树。结果表明,当目标是确定分类群之间的系统发育关系时,由许多短序列组成的新一代数据的串联比对对于基因树/物种树冲突具有稳健性。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验