• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度抛光机进行高精度装配抛光。

Highly accurate assembly polishing with DeepPolisher.

作者信息

Mastoras Mira, Asri Mobin, Brambrink Lucas, Hebbar Prajna, Kolesnikov Alexey, Cook Daniel E, Nattestad Maria, Lucas Julian, Won Taylor S, Chang Pi-Chuan, Carroll Andrew, Paten Benedict, Shafin Kishwar

机构信息

UC Santa Cruz Genomics Institute, University of California, Santa Cruz, California 95060, USA.

Google Incorporated, Mountain View, California 94043, USA.

出版信息

Genome Res. 2025 Jul 1;35(7):1595-1608. doi: 10.1101/gr.280149.124.

DOI:10.1101/gr.280149.124
PMID:40389286
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12212083/
Abstract

Accurate genome assemblies are essential for biological research, but even the highest-quality assemblies retain errors caused by the technologies used to construct them. Base-level errors are typically fixed with an additional polishing step that uses reads aligned to the draft assembly to identify necessary edits. However, current methods struggle to find a balance between over- and underpolishing. Here, we present an encoder-only transformer model for assembly polishing called DeepPolisher, which predicts corrections to the underlying sequence using Pacific Biosciences (PacBio) HiFi read alignments to a diploid assembly. Our pipeline introduces a method, PHAsing Reads in Areas Of Homozygosity (PHARAOH), which uses ultralong Oxford Nanopore Technologies (ONT) data to ensure alignments are accurately phased and to correctly introduce heterozygous edits in falsely homozygous regions. We demonstrate that the DeepPolisher pipeline can reduce assembly errors by approximately half, mostly driven by reductions in indel errors. We have applied our DeepPolisher-based pipeline to 180 assemblies from the next Human Pangenome Reference Consortium (HPRC) data release, producing an average predicted quality value (QV) improvement of 3.4 (54% error reduction) for the majority of the genome.

摘要

准确的基因组组装对于生物学研究至关重要,但即使是最高质量的组装也会保留由用于构建它们的技术所导致的错误。碱基水平的错误通常通过额外的抛光步骤来修复,该步骤使用与草图组装比对的 reads 来识别必要的编辑。然而,当前的方法难以在过度抛光和抛光不足之间找到平衡。在这里,我们提出了一种用于组装抛光的仅编码器的变压器模型,称为 DeepPolisher,它使用太平洋生物科学公司(PacBio)的 HiFi reads 与二倍体组装的比对来预测对基础序列的校正。我们的流程引入了一种方法,即纯合区域的相位读取(PHARAOH),它使用超长的牛津纳米孔技术(ONT)数据来确保比对准确相位,并在错误的纯合区域正确引入杂合编辑。我们证明,DeepPolisher 流程可以将组装错误减少约一半,主要是由插入缺失错误的减少驱动的。我们已将基于 DeepPolisher 的流程应用于来自下一代人类泛基因组参考联盟(HPRC)数据发布的 180 个组装,对于大多数基因组,平均预测质量值(QV)提高了 3.4(错误减少 54%)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/812c5707855d/1595f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/ae63bce1da34/1595f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/029855e22a7c/1595f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/544f6319958d/1595f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/812c5707855d/1595f04.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/ae63bce1da34/1595f01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/029855e22a7c/1595f02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/544f6319958d/1595f03.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a02/12212083/812c5707855d/1595f04.jpg

相似文献

1
Highly accurate assembly polishing with DeepPolisher.使用深度抛光机进行高精度装配抛光。
Genome Res. 2025 Jul 1;35(7):1595-1608. doi: 10.1101/gr.280149.124.
2
Highly accurate assembly polishing with DeepPolisher.使用深度抛光机进行高精度装配抛光。
bioRxiv. 2024 Sep 19:2024.09.17.613505. doi: 10.1101/2024.09.17.613505.
3
Verkko2 integrates proximity-ligation data with long-read De Bruijn graphs for efficient telomere-to-telomere genome assembly, phasing, and scaffolding.Verkko2将邻近连接数据与长读长德布鲁因图相结合,以实现高效的端粒到端粒基因组组装、定相和支架搭建。
Genome Res. 2025 Jun 12. doi: 10.1101/gr.280383.124.
4
Comparison of Illumina and Oxford Nanopore Technology systems for the genomic characterization of .用于……基因组特征分析的Illumina和牛津纳米孔技术系统的比较
Microbiol Spectr. 2025 Jul;13(7):e0129424. doi: 10.1128/spectrum.01294-24. Epub 2025 May 28.
5
SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification.SQANTI:用于全长转录组鉴定和定量的长读转录序列的广泛特征化,以进行质量控制。
Genome Res. 2018 Mar 1;28(3):396-411. doi: 10.1101/gr.222976.117.
6
Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.利用预后信息为乳腺癌患者选择辅助性全身治疗的成本效益
Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.
7
PVGA: a precise viral genome assembler using an iterative alignment graph.PVGA:一种使用迭代比对图的精确病毒基因组组装器。
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf063.
8
Bi-level error correction for PacBio long reads.用于PacBio长读段的双水平错误校正
IEEE/ACM Trans Comput Biol Bioinform. 2020 May-June;17(3):899-905. doi: 10.1109/TCBB.2017.2780832. Epub 2017 Dec 7.
9
Magnetic resonance perfusion for differentiating low-grade from high-grade gliomas at first presentation.首次就诊时磁共振灌注成像用于鉴别低级别与高级别胶质瘤
Cochrane Database Syst Rev. 2018 Jan 22;1(1):CD011551. doi: 10.1002/14651858.CD011551.pub2.
10
Nivolumab for adults with Hodgkin's lymphoma (a rapid review using the software RobotReviewer).纳武单抗用于成人霍奇金淋巴瘤(使用RobotReviewer软件进行的快速综述)
Cochrane Database Syst Rev. 2018 Jul 12;7(7):CD012556. doi: 10.1002/14651858.CD012556.pub2.

引用本文的文献

1
Verkko2 integrates proximity-ligation data with long-read De Bruijn graphs for efficient telomere-to-telomere genome assembly, phasing, and scaffolding.Verkko2将邻近连接数据与长读长德布鲁因图相结合,以实现高效的端粒到端粒基因组组装、定相和支架搭建。
Genome Res. 2025 Jun 12. doi: 10.1101/gr.280383.124.

本文引用的文献

1
Human de novo mutation rates from a four-generation pedigree reference.基于一个四代家系参考得出的人类新生突变率。
Nature. 2025 Apr 23. doi: 10.1038/s41586-025-08922-2.
2
Complete sequencing of ape genomes.猿类基因组的完整测序。
Nature. 2025 May;641(8062):401-418. doi: 10.1038/s41586-025-08816-3. Epub 2025 Apr 9.
3
NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads.NextPolish2:一种针对使用 HiFi 长读长组装的基因组进行重复感知优化的工具。
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad009.
4
Scalable telomere-to-telomere assembly for diploid and polyploid genomes with double graph.使用双图进行二倍体和多倍体基因组的可扩展端粒到端粒组装。
Nat Methods. 2024 Jun;21(6):967-970. doi: 10.1038/s41592-024-02269-8. Epub 2024 May 10.
5
Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References.超越人类基因组计划:完整人类基因组序列和泛基因组参考时代。
Annu Rev Genomics Hum Genet. 2024 Aug;25(1):77-104. doi: 10.1146/annurev-genom-021623-081639. Epub 2024 Aug 6.
6
Genome assembly in the telomere-to-telomere era.端粒到端粒时代的基因组组装。
Nat Rev Genet. 2024 Sep;25(9):658-670. doi: 10.1038/s41576-024-00718-w. Epub 2024 Apr 22.
7
Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation.可扩展的纳米孔测序技术对人类基因组进行测序,提供了全面的单倍型分辨率变异和甲基化视图。
Nat Methods. 2023 Oct;20(10):1483-1492. doi: 10.1038/s41592-023-01993-x. Epub 2023 Sep 14.
8
A draft human pangenome reference.人类泛基因组参考草图。
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
9
Applications of transformer-based language models in bioinformatics: a survey.基于Transformer的语言模型在生物信息学中的应用:一项综述。
Bioinform Adv. 2023 Jan 11;3(1):vbad001. doi: 10.1093/bioadv/vbad001. eCollection 2023.
10
Telomere-to-telomere assembly of diploid chromosomes with Verkko.利用 Verkko 进行二倍体染色体的端粒到端粒组装。
Nat Biotechnol. 2023 Oct;41(10):1474-1482. doi: 10.1038/s41587-023-01662-6. Epub 2023 Feb 16.