Suppr超能文献

跨越 32Gb 墨西哥钝口螈基因组的虚拟基因组步移;组装基因模型和内含子序列。

Virtual Genome Walking across the 32 Gb Ambystoma mexicanum genome; assembling gene models and intronic sequence.

机构信息

School of Life Sciences, University of Nottingham, Nottingham, NG7 2UH, UK.

出版信息

Sci Rep. 2018 Jan 12;8(1):618. doi: 10.1038/s41598-017-19128-6.

Abstract

Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .

摘要

大片段重复丰富的基因组给使用短读长技术进行组装带来了挑战。32GB 的蝾螈基因组估计含有约 19GB 的重复 DNA,仅使用短读长进行组装实际上是不可能的。事实上,这个模式生物已经测序到 20×的覆盖度,但这些读长无法进行常规组装。我们使用一种替代策略,将这些读长的子集组装成描述超过 19000 个基因模型的支架。我们将这种方法称为虚拟基因组步移,因为它基于参考转录组局部组装整个基因组读长,识别外显子并迭代地将它们扩展到周围的基因组序列中。然后,这些组装被连接和细化,以生成包括上下游基因组和内含子序列的基因模型。我们的组装通过与先前发表的蝾螈细菌人工染色体(BAC)序列进行比较来验证。我们对蝾螈内含子长度、内含子-外显子结构、重复含量和同线性的分析为这个模式生物的基因结构提供了新的见解。这个资源将使在蝾螈中进行新的实验方法,如 ChIP-Seq 和 CRISPR,成为可能,并有助于未来的全基因组测序工作。这里呈现的组装序列和注释可从 https://tinyurl.com/y8gydc6n 免费下载。软件管道可从 https://github.com/LooseLab/iterassemble 获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验