Suppr超能文献

使用 PEPPER-Margin-DeepVariant 进行单体型感知变异调用可实现纳米孔长读段的高精度。

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads.

机构信息

UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA.

Google Inc, Mountain View, CA, USA.

出版信息

Nat Methods. 2021 Nov;18(11):1322-1332. doi: 10.1038/s41592-021-01299-w. Epub 2021 Nov 1.

Abstract

Long-read sequencing has the potential to transform variant detection by reaching currently difficult-to-map regions and routinely linking together adjacent variations to enable read-based phasing. Third-generation nanopore sequence data have demonstrated a long read length, but current interpretation methods for their novel pore-based signal have unique error profiles, making accurate analysis challenging. Here, we introduce a haplotype-aware variant calling pipeline, PEPPER-Margin-DeepVariant, that produces state-of-the-art variant calling results with nanopore data. We show that our nanopore-based method outperforms the short-read-based single-nucleotide-variant identification method at the whole-genome scale and produces high-quality single-nucleotide variants in segmental duplications and low-mappability regions where short-read-based genotyping fails. We show that our pipeline can provide highly contiguous phase blocks across the genome with nanopore reads, contiguously spanning between 85% and 92% of annotated genes across six samples. We also extend PEPPER-Margin-DeepVariant to PacBio HiFi data, providing an efficient solution with superior performance over the current WhatsHap-DeepVariant standard. Finally, we demonstrate de novo assembly polishing methods that use nanopore and PacBio HiFi reads to produce diploid assemblies with high accuracy (Q35+ nanopore-polished and Q40+ PacBio HiFi-polished).

摘要

长读测序有可能通过覆盖目前难以映射的区域,并常规地将相邻的变异链接起来以实现基于读取的相位,从而改变变异检测。第三代纳米孔测序数据展示了长读长,但它们新颖的基于孔的信号的当前解释方法具有独特的错误分布,使得准确的分析具有挑战性。在这里,我们引入了一种基于单倍型的变异调用管道,PEPPER-Margin-DeepVariant,它使用纳米孔数据产生了最先进的变异调用结果。我们表明,我们的基于纳米孔的方法在全基因组范围内优于基于短读的单核苷酸变异识别方法,并在短读基于基因分型失败的片段重复和低可映射区域中产生高质量的单核苷酸变异。我们表明,我们的管道可以使用纳米孔读取在基因组上提供高度连续的相位块,连续跨越六个样本中 85%到 92%的注释基因。我们还将 PEPPER-Margin-DeepVariant 扩展到 PacBio HiFi 数据,提供了一种高效的解决方案,其性能优于当前的 WhatsHap-DeepVariant 标准。最后,我们展示了从头组装抛光方法,该方法使用纳米孔和 PacBio HiFi 读取来产生具有高精度的二倍体组装体(Q35+纳米孔抛光和 Q40+PacBio HiFi 抛光)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6f8d/8571015/36abaaa5f7cb/nihms-1738709-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验