• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

HMMPolish:一种用于 TGS 测序 RNA 病毒的编码区修饰工具。

HMMPolish: a coding region polishing tool for TGS-sequenced RNA viruses.

机构信息

Electrical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, China.

出版信息

Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad264.

DOI:10.1093/bib/bbad264
PMID:37478372
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10516367/
Abstract

Access to accurate viral genomes is important to downstream data analysis. Third-generation sequencing (TGS) has recently become a popular platform for virus sequencing because of its long read length. However, its per-base error rate, which is higher than next-generation sequencing, can lead to genomes with errors. Polishing tools are thus needed to correct errors either before or after sequence assembly. Despite promising results of available polishing tools, there is still room to improve the error correction performance to perform more accurate genome assembly. The errors, particularly those in coding regions, can hamper analysis such as linage identification and variant monitoring. In this work, we developed a novel pipeline, HMMPolish, for correcting (polishing) errors in protein-coding regions of known RNA viruses. This tool can be applied to either raw TGS reads or the assembled sequences of the target virus. By utilizing profile Hidden Markov Models of protein families/domains in known viruses, HMMPolish can correct errors that are ignored by available polishers. We extensively validated HMMPolish on 34 datasets that covered four clinically important viruses, including HIV-1, influenza-A, norovirus, and severe acute respiratory syndrome coronavirus 2. These datasets contain reads with different properties, such as sequencing depth and platforms (PacBio or Nanopore). The benchmark results against popular/representative polishers show that HMMPolish competes favorably on error correction in coding regions of known RNA viruses.

摘要

获得准确的病毒基因组对于下游数据分析很重要。第三代测序(TGS)由于其长读长,最近已成为病毒测序的流行平台。然而,其碱基错误率高于下一代测序,这可能导致基因组存在错误。因此,需要使用润色工具在序列组装之前或之后纠正错误。尽管现有的润色工具取得了有希望的结果,但仍有改进错误纠正性能的空间,以实现更准确的基因组组装。这些错误,特别是编码区的错误,会阻碍谱系鉴定和变异监测等分析。在这项工作中,我们开发了一种新的流水线 HMMPolish,用于纠正(润色)已知 RNA 病毒的编码区中的错误。该工具可应用于原始 TGS 读段或目标病毒的组装序列。通过利用已知病毒中蛋白质家族/结构域的 Profile Hidden Markov Models,HMMPolish 可以纠正现有润色工具忽略的错误。我们在涵盖 HIV-1、流感 A、诺如病毒和严重急性呼吸综合征冠状病毒 2 等四种临床重要病毒的 34 个数据集上对 HMMPolish 进行了广泛验证。这些数据集包含具有不同特性的读段,例如测序深度和平台(PacBio 或 Nanopore)。与流行/代表性润色工具的基准测试结果表明,HMMPolish 在已知 RNA 病毒编码区的错误纠正方面具有竞争力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/c7da16cd9bb9/bbad264f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/f641d1883754/bbad264f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/c43c49b2e7ba/bbad264f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/8542b54c8eaa/bbad264f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/c5195a92d8e6/bbad264f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/b37149652c5e/bbad264f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/c7da16cd9bb9/bbad264f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/f641d1883754/bbad264f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/c43c49b2e7ba/bbad264f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/8542b54c8eaa/bbad264f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/c5195a92d8e6/bbad264f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/b37149652c5e/bbad264f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3859/10516367/c7da16cd9bb9/bbad264f6.jpg

相似文献

1
HMMPolish: a coding region polishing tool for TGS-sequenced RNA viruses.HMMPolish:一种用于 TGS 测序 RNA 病毒的编码区修饰工具。
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad264.
2
Benchmarking short and long read polishing tools for nanopore assemblies: achieving near-perfect genomes for outbreak isolates.针对纳米孔组装的短读和长读抛光工具进行基准测试:实现暴发分离株的近乎完美基因组。
BMC Genomics. 2024 Jul 8;25(1):679. doi: 10.1186/s12864-024-10582-x.
3
NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads.NextPolish2:一种针对使用 HiFi 长读长组装的基因组进行重复感知优化的工具。
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad009.
4
AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data.AccuVIR:一种用于第三代测序数据的 ACCUrate 病毒基因组组装工具。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac827.
5
Benchmarking of long-read sequencing, assemblers and polishers for yeast genome.酵母基因组长读测序、组装和精修的基准测试。
Brief Bioinform. 2022 May 13;23(3). doi: 10.1093/bib/bbac146.
6
TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads.TGS-GapCloser:一种快速准确的大型基因组缺口闭合方法,适用于错误倾向的长reads 覆盖率低的情况。
Gigascience. 2020 Sep 1;9(9). doi: 10.1093/gigascience/giaa094.
7
Polishing the Oxford Nanopore long-read assemblies of bacterial pathogens with Illumina short reads to improve genomic analyses.用 Illumina 短读序列对牛津纳米孔长读序列组装的细菌病原体进行打磨,以改进基因组分析。
Genomics. 2021 May;113(3):1366-1377. doi: 10.1016/j.ygeno.2021.03.018. Epub 2021 Mar 11.
8
Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction.对一种药用植物多倍体基因组上的纳米孔测序(ONT)技术和环形一致序列(CCS)测序技术进行比较后发现,ONT读数的高错误率不适用于自我校正。
Chin Med. 2022 Aug 9;17(1):94. doi: 10.1186/s13020-022-00644-1.
9
Polypolish: Short-read polishing of long-read bacterial genome assemblies.多聚波兰:长读细菌基因组组装的短读抛光。
PLoS Comput Biol. 2022 Jan 24;18(1):e1009802. doi: 10.1371/journal.pcbi.1009802. eCollection 2022 Jan.
10
BlockPolish: accurate polishing of long-read assembly via block divide-and-conquer.BlockPolish:通过块划分与征服实现长读序列组装的精确抛光。
Brief Bioinform. 2022 Jan 17;23(1). doi: 10.1093/bib/bbab405.

引用本文的文献

1
The utility of integrating nanopore sequencing into routine HIV-1 drug resistance surveillance.将纳米孔测序整合到常规HIV-1耐药性监测中的实用性。
Microb Genom. 2025 Mar;11(3). doi: 10.1099/mgen.0.001375.

本文引用的文献

1
Time- and memory-efficient genome assembly with Raven.使用Raven进行高效省时的基因组组装。
Nat Comput Sci. 2021 May;1(5):332-336. doi: 10.1038/s43588-021-00073-4. Epub 2021 May 20.
2
AccuVIR: an ACCUrate VIRal genome assembly tool for third-generation sequencing data.AccuVIR:一种用于第三代测序数据的 ACCUrate 病毒基因组组装工具。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac827.
3
Multiple pathways for SARS-CoV-2 resistance to nirmatrelvir.SARS-CoV-2 对奈玛特韦产生耐药性的多种途径。
Nature. 2023 Jan;613(7944):558-564. doi: 10.1038/s41586-022-05514-2. Epub 2022 Nov 9.
4
HaploDMF: viral haplotype reconstruction from long reads via deep matrix factorization.HaploDMF:通过深度矩阵分解从长读中重建病毒单倍型。
Bioinformatics. 2022 Dec 13;38(24):5360-5367. doi: 10.1093/bioinformatics/btac708.
5
Vaccinia-Virus-Based Vaccines Are Expected to Elicit Highly Cross-Reactive Immunity to the 2022 Monkeypox Virus.基于牛痘病毒的疫苗有望引发对 2022 年猴痘病毒的高度交叉反应性免疫。
Viruses. 2022 Sep 3;14(9):1960. doi: 10.3390/v14091960.
6
Genomic analysis of human noroviruses using combined Illumina-Nanopore data.利用Illumina-Nanopore联合数据对人诺如病毒进行基因组分析。
Virus Evol. 2021 Sep 15;7(2):veab079. doi: 10.1093/ve/veab079. eCollection 2021.
7
Reconstructing viral haplotypes using long reads.使用长读长重建病毒单倍型。
Bioinformatics. 2022 Apr 12;38(8):2127-2134. doi: 10.1093/bioinformatics/btac089.
8
Lessons learned 1 year after SARS-CoV-2 emergence leading to COVID-19 pandemic.SARS-CoV-2 引发 COVID-19 大流行一年后的经验教训。
Emerg Microbes Infect. 2021 Dec;10(1):507-535. doi: 10.1080/22221751.2021.1898291.
9
Dynamic nanopore long-read sequencing analysis of HIV-1 splicing events during the early steps of infection.动态纳米孔长读测序分析 HIV-1 感染早期的剪接事件。
Retrovirology. 2020 Aug 17;17(1):25. doi: 10.1186/s12977-020-00533-1.
10
Portable nanopore analytics: are we there yet?便携式纳米孔分析:我们做到了吗?
Bioinformatics. 2020 Aug 15;36(16):4399-4405. doi: 10.1093/bioinformatics/btaa237.