Suppr超能文献

利用ddRADseq技术对非模式树种红栎进行高质量基因定位。

High-quality genetic mapping with ddRADseq in the non-model tree Quercus rubra.

作者信息

Konar Arpita, Choudhury Olivia, Bullis Rebecca, Fiedler Lauren, Kruser Jacqueline M, Stephens Melissa T, Gailing Oliver, Schlarbaum Scott, Coggeshall Mark V, Staton Margaret E, Carlson John E, Emrich Scott, Romero-Severson Jeanne

机构信息

Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA.

Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA.

出版信息

BMC Genomics. 2017 May 30;18(1):417. doi: 10.1186/s12864-017-3765-8.

Abstract

BACKGROUND

Restriction site associated DNA sequencing (RADseq) has the potential to be a broadly applicable, low-cost approach for high-quality genetic linkage mapping in forest trees lacking a reference genome. The statistical inference of linear order must be as accurate as possible for the correct ordering of sequence scaffolds and contigs to chromosomal locations. Accurate maps also facilitate the discovery of chromosome segments containing allelic variants conferring resistance to the biotic and abiotic stresses that threaten forest trees worldwide. We used ddRADseq for genetic mapping in the tree Quercus rubra, with an approach optimized to produce a high-quality map. Our study design also enabled us to model the results we would have obtained with less depth of coverage.

RESULTS

Our sequencing design produced a high sequencing depth in the parents (248×) and a moderate sequencing depth (15×) in the progeny. The digital normalization method of generating a de novo reference and the SAMtools SNP variant caller yielded the most SNP calls (78,725). The major drivers of map inflation were multiple SNPs located within the same sequence (77% of SNPs called). The highest quality map was generated with a low level of missing data (5%) and a genome-wide threshold of 0.025 for deviation from Mendelian expectation. The final map included 849 SNP markers (1.8% of the 78,725 SNPs called). Downsampling the individual FASTQ files to model lower depth of coverage revealed that sequencing the progeny using 96 samples per lane would have yielded too few SNP markers to generate a map, even if we had sequenced the parents at depth 248×.

CONCLUSIONS

The ddRADseq technology produced enough high-quality SNP markers to make a moderately dense, high-quality map. The success of this project was due to high depth of coverage of the parents, moderate depth of coverage of the progeny, a good framework map, an optimized bioinformatics pipeline, and rigorous premapping filters. The ddRADseq approach is useful for the construction of high-quality genetic maps in organisms lacking a reference genome if the parents and progeny are sequenced at sufficient depth. Technical improvements in reduced representation sequencing (RRS) approaches are needed to reduce the amount of missing data.

摘要

背景

限制性位点关联DNA测序(RADseq)有潜力成为一种广泛适用、低成本的方法,用于在缺乏参考基因组的林木中进行高质量的遗传连锁图谱构建。对于将序列支架和重叠群正确排序到染色体位置而言,线性顺序的统计推断必须尽可能准确。准确的图谱还有助于发现包含赋予对威胁全球林木的生物和非生物胁迫抗性的等位基因变异的染色体片段。我们使用双酶切RADseq(ddRADseq)对红橡树进行遗传图谱构建,并采用了优化方法以生成高质量图谱。我们的研究设计还使我们能够模拟在较低覆盖深度下可能获得的结果。

结果

我们的测序设计在亲本中产生了高测序深度(248×),在子代中产生了中等测序深度(15×)。生成从头参考序列的数字归一化方法和SAMtools SNP变异体调用程序产生的SNP调用最多(78,725个)。图谱膨胀的主要驱动因素是位于同一序列内的多个SNP(所调用SNP的77%)。缺失数据水平较低(5%)且全基因组偏离孟德尔预期的阈值为0.025时,生成了质量最高的图谱。最终图谱包含849个SNP标记(在所调用的78,725个SNP中的1.8%)。对单个FASTQ文件进行下采样以模拟较低的覆盖深度表明,即使我们对亲本进行了248×深度的测序,每条泳道使用96个样本对子代进行测序也会产生太少的SNP标记而无法生成图谱。

结论

ddRADseq技术产生了足够数量的高质量SNP标记,以构建中等密度的高质量图谱。该项目的成功归因于亲本的高覆盖深度、子代的中等覆盖深度、良好的框架图谱、优化的生物信息学流程以及严格的预映射筛选。如果对亲本和子代进行足够深度的测序,ddRADseq方法对于在缺乏参考基因组的生物体中构建高质量遗传图谱很有用。需要在简化代表性测序(RRS)方法上进行技术改进,以减少缺失数据的量。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验