Suppr超能文献

基于从总DNA序列中基于k-mer频率选择叶绿体reads来组装非模式物种的完整叶绿体基因组

Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences.

作者信息

Izan Shairul, Esselink Danny, Visser Richard G F, Smulders Marinus J M, Borm Theo

机构信息

Plant Breeding, Wageningen University and ResearchWageningen, Netherlands.

Department of Crop Science, Faculty of Agriculture, Universiti Putra MalaysiaSerdang, Malaysia.

出版信息

Front Plant Sci. 2017 Aug 2;8:1271. doi: 10.3389/fpls.2017.01271. eCollection 2017.

Abstract

Whole Genome Shotgun (WGS) sequences of plant species often contain an abundance of reads that are derived from the chloroplast genome. Up to now these reads have generally been identified and assembled into chloroplast genomes based on homology to chloroplasts from related species. This re-sequencing approach may select against structural differences between the genomes especially in non-model species for which no close relatives have been sequenced before. The alternative approach is to assemble the chloroplast genome from total genomic DNA sequences. In this study, we used -mer frequency tables to identify and extract the chloroplast reads from the WGS reads and assemble these using a highly integrated and automated custom pipeline. Our strategy includes steps aimed at optimizing assemblies and filling gaps which are left due to coverage variation in the WGS dataset. We have successfully assembled three complete chloroplast genomes from plant species with a range of nuclear genome sizes to demonstrate the universality of our approach: (0.9 Gb), (4 Gb) and (25 Gb). We also highlight the need to optimize the choice of k and the amount of data used. This new and cost-effective method for short read assembly will facilitate the study of complete chloroplast genomes with more accurate analyses and inferences, especially in non-model plant genomes.

摘要

植物物种的全基因组鸟枪法(WGS)序列通常包含大量源自叶绿体基因组的 reads。到目前为止,这些 reads 通常是基于与相关物种叶绿体的同源性来识别并组装成叶绿体基因组的。这种重测序方法可能会忽略基因组之间的结构差异,尤其是在那些之前没有近缘物种被测序的非模式物种中。另一种方法是从总基因组 DNA 序列中组装叶绿体基因组。在本研究中,我们使用 k-mer 频率表从 WGS reads 中识别并提取叶绿体 reads,并使用高度集成和自动化的定制流程进行组装。我们的策略包括旨在优化组装以及填补由于 WGS 数据集中覆盖度变化而留下的缺口的步骤。我们已经成功地从具有一系列核基因组大小的植物物种中组装出了三个完整的叶绿体基因组,以证明我们方法的通用性:(0.9 Gb)、(4 Gb)和(25 Gb)。我们还强调了优化 k 的选择和所用数据量的必要性。这种用于短 reads 组装的新的且具有成本效益的方法将有助于通过更准确的分析和推断来研究完整的叶绿体基因组,特别是在非模式植物基因组中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c9b9/5539191/3d8affc3e2fa/fpls-08-01271-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验