Suppr超能文献

链接读取技术用于组装大型复杂和多倍体基因组。

Linked read technology for assembling large complex and polyploid genomes.

机构信息

Department of Agronomy, Iowa State University, Ames, IA, 50011, USA.

Present address: Roche Sequencing Solutions, 500 S Rosa Road, Madison, WI, 53719, USA.

出版信息

BMC Genomics. 2018 Sep 4;19(1):651. doi: 10.1186/s12864-018-5040-z.

Abstract

BACKGROUND

Short read DNA sequencing technologies have revolutionized genome assembly by providing high accuracy and throughput data at low cost. But it remains challenging to assemble short read data, particularly for large, complex and polyploid genomes. The linked read strategy has the potential to enhance the value of short reads for genome assembly because all reads originating from a single long molecule of DNA share a common barcode. However, the majority of studies to date that have employed linked reads were focused on human haplotype phasing and genome assembly.

RESULTS

Here we describe a de novo maize B73 genome assembly generated via linked read technology which contains ~ 172,000 scaffolds with an N50 of 89 kb that cover 50% of the genome. Based on comparisons to the B73 reference genome, 91% of linked read contigs are accurately assembled. Because it was possible to identify errors with > 76% accuracy using machine learning, it may be possible to identify and potentially correct systematic errors. Complex polyploids represent one of the last grand challenges in genome assembly. Linked read technology was able to successfully resolve the two subgenomes of the recent allopolyploid, proso millet (Panicum miliaceum). Our assembly covers ~ 83% of the 1 Gb genome and consists of 30,819 scaffolds with an N50 of 912 kb.

CONCLUSIONS

Our analysis provides a framework for future de novo genome assemblies using linked reads, and we suggest computational strategies that if implemented have the potential to further improve linked read assemblies, particularly for repetitive genomes.

摘要

背景

短读 DNA 测序技术以其低成本、高通量和高精度的数据提供,彻底改变了基因组组装。但组装短读数据仍然具有挑战性,特别是对于大型、复杂和多倍体基因组。连接读取策略有可能提高短读在基因组组装中的价值,因为所有源自单个长 DNA 分子的读取都具有共同的条形码。然而,迄今为止,大多数使用连接读取的研究都集中在人类单倍型相位和基因组组装上。

结果

我们在这里描述了一个通过连接读取技术生成的玉米 B73 基因组从头组装,该组装包含约 172000 个支架,N50 为 89kb,覆盖基因组的 50%。基于与 B73 参考基因组的比较,91%的连接读取片段能够准确组装。由于机器学习能够以>76%的准确率识别错误,因此可能可以识别和纠正系统错误。复杂的多倍体是基因组组装的最后一个重大挑战之一。连接读取技术成功地解析了近期异源多倍体谷子(Panicum miliaceum)的两个亚基因组。我们的组装覆盖了约 10 亿基因组的 83%,由 30819 个支架组成,N50 为 912kb。

结论

我们的分析为未来使用连接读取进行从头基因组组装提供了一个框架,并提出了计算策略,如果实施,有可能进一步提高连接读取组装的质量,特别是对于重复序列基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a0ce/6122573/66d2a7efcc94/12864_2018_5040_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验