Suppr超能文献

利用单细胞测序和长读长技术进行全相基因组组装,无需父母数据。

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.

机构信息

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA.

Heinrich Heine University Düsseldorf, Medical Faculty, Institute for Medical Biometry and Bioinformatics, Düsseldorf, Germany.

出版信息

Nat Biotechnol. 2021 Mar;39(3):302-308. doi: 10.1038/s41587-020-0719-5. Epub 2020 Dec 7.

Abstract

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing with continuous long-read or high-fidelity sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

摘要

人类基因组通常被组装为缺乏亲本单倍型信息的共识序列。在这里,我们描述了一种无参考的二倍体从头基因组组装工作流程,该流程结合了单细胞测序在全染色体水平上的相位和支架构建能力,以及连续的长读长或高保真度测序数据。采用这种策略,我们在没有亲本数据的情况下,为波多黎各人个体(HG00733)的每个单倍型生成了完全相位的从头基因组组装。这些组装结果准确性高(质量值 > 40)且高度连续(N50 > 23 Mbp),切换错误率低(0.17%),提供了完全相位的单核苷酸变异、插入缺失和结构变异。对牛津纳米孔技术和太平洋生物科学相位组装的比较确定了 154 个区域,这些区域是无论测序技术或相位算法如何,优先发生连续体断裂的位点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51f/7954704/f9d744613909/41587_2020_719_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验