Suppr超能文献

LDscaff:基于连锁不平衡的从头基因组组装 scaffolding。

LDscaff: LD-based scaffolding of de novo genome assemblies.

机构信息

BGI Education Center, University of Chinese Academy of Sciences, Shenzhen, 518083, China.

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong SAR, 999077, China.

出版信息

BMC Bioinformatics. 2020 Dec 28;21(Suppl 21):570. doi: 10.1186/s12859-020-03895-7.

Abstract

BACKGROUND

Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding.

RESULTS

In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB.

CONCLUSIONS

Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.

摘要

背景

基因组组装是从头基因组分析的基础。利用各种测序技术的混合组装可以提高连续性和准确性。虽然这些方法需要额外的昂贵测序工作,但提供的数以百万计的已有的全基因组测序数据的信息尚未被充分利用来解决支架构建任务。群体数据中的遗传重组模式表明不同基因座的等位基因之间存在非随机关联,可以提供物理距离信号来指导支架构建。

结果

在本文中,我们提出了 LDscaff,用于在群体数据中纳入连锁不平衡信息的草案基因组组装。我们使用模拟数据和真实数据评估了我们方法的性能。我们通过分割猪参考基因组并重新组装它们来模拟支架。在没有间隙的情况下,支架之间的间隙范围从 0 到 100 KB。当没有间隙时,基因组错误组装率为 2.43%。然后,我们实施了我们的方法来改进大熊猫基因组和驴基因组,这两个基因组完全由 NGS 数据组装而成。经过 LDscaff 处理后,生成的熊猫组装具有 3.6 MB 的支架 N50,比原始 N50(1.3 MB)大 2.5 倍。重新组装的驴组装的 N50 长度从 23.8 MB 提高到 32.1 MB。

结论

我们的方法有效地改进了具有现有重测序数据的组装,是现有组装器的潜在替代方法,无需收集新数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8df8/7768660/ad13c59af68d/12859_2020_3895_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验