Suppr超能文献

使用深度抛光机进行高精度装配抛光。

Highly accurate assembly polishing with DeepPolisher.

作者信息

Mastoras Mira, Asri Mobin, Brambrink Lucas, Hebbar Prajna, Kolesnikov Alexey, Cook Daniel E, Nattestad Maria, Lucas Julian, Won Taylor S, Chang Pi-Chuan, Carroll Andrew, Paten Benedict, Shafin Kishwar

机构信息

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95060, USA.

Google Incorporated, Mountain View, California 94043, USA.

出版信息

Genome Res. 2025 Jul 1;35(7):1595-1608. doi: 10.1101/gr.280149.124.

Abstract

Accurate genome assemblies are essential for biological research, but even the highest-quality assemblies retain errors caused by the technologies used to construct them. Base-level errors are typically fixed with an additional polishing step that uses reads aligned to the draft assembly to identify necessary edits. However, current methods struggle to find a balance between over- and underpolishing. Here, we present an encoder-only transformer model for assembly polishing called DeepPolisher, which predicts corrections to the underlying sequence using Pacific Biosciences (PacBio) HiFi read alignments to a diploid assembly. Our pipeline introduces a method, PHAsing Reads in Areas Of Homozygosity (PHARAOH), which uses ultralong Oxford Nanopore Technologies (ONT) data to ensure alignments are accurately phased and to correctly introduce heterozygous edits in falsely homozygous regions. We demonstrate that the DeepPolisher pipeline can reduce assembly errors by approximately half, mostly driven by reductions in indel errors. We have applied our DeepPolisher-based pipeline to 180 assemblies from the next Human Pangenome Reference Consortium (HPRC) data release, producing an average predicted quality value (QV) improvement of 3.4 (54% error reduction) for the majority of the genome.

摘要

准确的基因组组装对于生物学研究至关重要,但即使是最高质量的组装也会保留由用于构建它们的技术所导致的错误。碱基水平的错误通常通过额外的抛光步骤来修复,该步骤使用与草图组装比对的 reads 来识别必要的编辑。然而,当前的方法难以在过度抛光和抛光不足之间找到平衡。在这里,我们提出了一种用于组装抛光的仅编码器的变压器模型,称为 DeepPolisher,它使用太平洋生物科学公司(PacBio)的 HiFi reads 与二倍体组装的比对来预测对基础序列的校正。我们的流程引入了一种方法,即纯合区域的相位读取(PHARAOH),它使用超长的牛津纳米孔技术(ONT)数据来确保比对准确相位,并在错误的纯合区域正确引入杂合编辑。我们证明,DeepPolisher 流程可以将组装错误减少约一半,主要是由插入缺失错误的减少驱动的。我们已将基于 DeepPolisher 的流程应用于来自下一代人类泛基因组参考联盟(HPRC)数据发布的 180 个组装,对于大多数基因组,平均预测质量值(QV)提高了 3.4(错误减少 54%)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/ae63bce1da34/1595f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验