Suppr超能文献

长读长序列深度和长度对玉米自交系 NC358 组装的影响。

Effect of sequence depth and length in long-read assembly of the maize inbred NC358.

机构信息

Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, Iowa, 50011, USA.

Department of Genetics, University of Georgia, Athens, Georgia, 30602, USA.

出版信息

Nat Commun. 2020 May 8;11(1):2288. doi: 10.1038/s41467-020-16037-7.

Abstract

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75 × genomic depth and with N50 subread lengths of 11-21 kb. Assemblies with ≤30 × depth and N50 subread length of 11 kb are highly fragmented, with even low-copy genic regions showing degradation at 20 × depth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature.

摘要

长读长测序技术和支架构建技术的改进使得对复杂基因组进行参考级别组装变得更加高效。但是,为了合理分配有限的资源,评估关键的测序深度和读长仍然很重要。为此,我们使用 PacBio 测序数据,从 20 倍到 75 倍基因组深度,生成了玉米自交系 NC358 的 8 个复杂基因组组装。在 30 倍以下深度和 11kb 的 N50 读长下,组装结果高度碎片化,即使是低拷贝基因区域,在 20 倍深度下也会出现降解。对于基因、转座元件和高度重复的基因组特征(如端粒、异染色质结和着丝粒)的完整组装,观察到了不同的序列质量阈值。此外,我们还表明,即使是我们最碎片化的碱基组装,高质量的光学图谱也可以极大地提高连续性。随着长读长测序技术的不断成熟,本研究为社区提供了一个有用的资源分配参考。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d383/7211024/c2549d01523c/41467_2020_16037_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验