• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ARAMIS:从 NGS 长读的系统误差到精确组装。

ARAMIS: From systematic errors of NGS long reads to accurate assemblies.

机构信息

Centro de Biología Molecular Severo Ochoa (CBMSO) (CSIC-UAM), Madrid, Spain.

出版信息

Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.

DOI:10.1093/bib/bbab170
PMID:34013348
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8574707/
Abstract

NGS long-reads sequencing technologies (or third generation) such as Pacific BioSciences (PacBio) have revolutionized the sequencing field over the last decade improving multiple genomic applications like de novo genome assemblies. However, their error rate, mostly involving insertions and deletions (indels), is currently an important concern that requires special attention to be solved. Multiple algorithms are available to fix these sequencing errors using short reads (such as Illumina), although they require long processing times and some errors may persist. Here, we present Accurate long-Reads Assembly correction Method for Indel errorS (ARAMIS), the first NGS long-reads indels correction pipeline that combines several correction software in just one step using accurate short reads. As a proof OF concept, six organisms were selected based on their different GC content, size and genome complexity, and their PacBio-assembled genomes were corrected thoroughly by this pipeline. We found that the presence of systematic sequencing errors in long-reads PacBio sequences affecting homopolymeric regions, and that the type of indel error introduced during PacBio sequencing are related to the GC content of the organism. The lack of knowledge of this fact leads to the existence of numerous published studies where such errors have been found and should be resolved since they may contain incorrect biological information. ARAMIS yields better results with less computational resources needed than other correction tools and gives the possibility of detecting the nature of the found indel errors found and its distribution along the genome. The source code of ARAMIS is available at https://github.com/genomics-ngsCBMSO/ARAMIS.git.

摘要

NGS 长读测序技术(或第三代),如 PacificBioSciences(PacBio),在过去十年中彻底改变了测序领域,改善了从头基因组组装等多种基因组应用。然而,它们的错误率,主要涉及插入和缺失(indels),目前是一个需要特别关注的重要问题。有多种算法可用于使用短读(如 Illumina)来修复这些测序错误,尽管它们需要较长的处理时间,并且一些错误可能仍然存在。在这里,我们提出了用于 indel 错误的准确长读组装校正方法(ARAMIS),这是第一个 NGS 长读 indels 校正管道,它使用准确的短读在一步中结合了几种校正软件。作为概念验证,我们根据不同的 GC 含量、大小和基因组复杂性选择了六个生物体,并通过该管道彻底校正了它们的 PacBio 组装基因组。我们发现,长读 PacBio 序列中存在系统的测序错误,影响了同源多聚区域,并且 PacBio 测序过程中引入的 indel 错误类型与生物体的 GC 含量有关。由于缺乏对这一事实的了解,导致了许多发表的研究中都发现了此类错误,并且应该加以解决,因为它们可能包含不正确的生物学信息。与其他校正工具相比,ARAMIS 所需的计算资源更少,但结果更好,并提供了检测发现的 indel 错误的性质及其在基因组中的分布的可能性。ARAMIS 的源代码可在 https://github.com/genomics-ngsCBMSO/ARAMIS.git 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/ba88a40fdca2/bbab170f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/d951742c36a6/bbab170f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/d812d9fb3d69/bbab170f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/7cff60c1a477/bbab170f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/bcb6f21a35f4/bbab170f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/d5ef48554fe4/bbab170f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/4a20a65c91bf/bbab170f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/ee0d79c4123f/bbab170f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/ba88a40fdca2/bbab170f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/d951742c36a6/bbab170f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/d812d9fb3d69/bbab170f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/7cff60c1a477/bbab170f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/bcb6f21a35f4/bbab170f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/d5ef48554fe4/bbab170f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/4a20a65c91bf/bbab170f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/ee0d79c4123f/bbab170f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1866/8574707/ba88a40fdca2/bbab170f8.jpg

相似文献

1
ARAMIS: From systematic errors of NGS long reads to accurate assemblies.ARAMIS:从 NGS 长读的系统误差到精确组装。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab170.
2
A hybrid and scalable error correction algorithm for indel and substitution errors of long reads.一种用于长读段插入/缺失和替换错误的混合可扩展纠错算法。
BMC Genomics. 2019 Dec 20;20(Suppl 11):948. doi: 10.1186/s12864-019-6286-9.
3
Comparison of ONT and CCS sequencing technologies on the polyploid genome of a medicinal plant showed that high error rate of ONT reads are not suitable for self-correction.对一种药用植物多倍体基因组上的纳米孔测序(ONT)技术和环形一致序列(CCS)测序技术进行比较后发现,ONT读数的高错误率不适用于自我校正。
Chin Med. 2022 Aug 9;17(1):94. doi: 10.1186/s13020-022-00644-1.
4
Evaluating long-read de novo assembly tools for eukaryotic genomes: insights and considerations.评估真核生物基因组的长读长从头组装工具:见解与考虑。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad100. Epub 2023 Nov 24.
5
HISEA: HIerarchical SEed Aligner for PacBio data.HISEA:用于PacBio数据的分层种子比对器。
BMC Bioinformatics. 2017 Dec 19;18(1):564. doi: 10.1186/s12859-017-1953-9.
6
mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.mInDel:一种用于全基因组插入缺失标记开发的高通量高效流程
BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.
7
Lerna: transformer architectures for configuring error correction tools for short- and long-read genome sequencing.Lerna:用于配置短读和长读基因组测序错误纠正工具的变压器架构。
BMC Bioinformatics. 2022 Jan 6;23(1):25. doi: 10.1186/s12859-021-04547-0.
8
NextPolish2: A Repeat-aware Polishing Tool for Genomes Assembled Using HiFi Long Reads.NextPolish2:一种针对使用 HiFi 长读长组装的基因组进行重复感知优化的工具。
Genomics Proteomics Bioinformatics. 2024 May 9;22(1). doi: 10.1093/gpbjnl/qzad009.
9
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
10
Illumina error correction near highly repetitive DNA regions improves de novo genome assembly.Illumina 纠错技术在高度重复 DNA 区域的应用提高了从头基因组组装的质量。
BMC Bioinformatics. 2019 Jun 3;20(1):298. doi: 10.1186/s12859-019-2906-2.

引用本文的文献

1
A detailed analysis of second and third-generation sequencing approaches for accurate length determination of short tandem repeats and homopolymers.用于精确测定短串联重复序列和同聚物长度的第二代和第三代测序方法的详细分析。
Nucleic Acids Res. 2025 Feb 27;53(5). doi: 10.1093/nar/gkaf131.
2
TrAnnoScope: A Modular Snakemake Pipeline for Full-Length Transcriptome Analysis and Functional Annotation.TrAnnoScope:用于全长转录组分析和功能注释的模块化Snakemake工作流程
Genes (Basel). 2024 Nov 29;15(12):1547. doi: 10.3390/genes15121547.
3
Modern microbiology: Embracing complexity through integration across scales.

本文引用的文献

1
A comprehensive evaluation of long read error correction methods.长读错误纠正方法的综合评价。
BMC Genomics. 2020 Dec 21;21(Suppl 6):889. doi: 10.1186/s12864-020-07227-0.
2
Nitrate Respiration in NAR1: from Horizontal Gene Transfer to Internal Evolution.硝酸盐呼吸在 NAR1 中的作用:从水平基因转移到内部进化。
Genes (Basel). 2020 Nov 4;11(11):1308. doi: 10.3390/genes11111308.
3
GC bias affects genomic and metagenomic reconstructions, underrepresenting GC-poor organisms.GC 偏倚影响基因组和宏基因组的重建,使 GC 含量低的生物代表性不足。
现代微生物学:通过跨尺度整合拥抱复杂性。
Cell. 2024 Sep 19;187(19):5151-5170. doi: 10.1016/j.cell.2024.08.028.
4
Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities.对用于复杂微生物群落宏基因组测序的短读长、长读长和混合读长组装器进行基准测试。
Microbiology (Reading). 2024 Jun;170(6). doi: 10.1099/mic.0.001469.
5
A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats.经过改良的大鼠参考基因组提高了实验室大鼠遗传多样性的发现。
Cell Genom. 2024 Apr 10;4(4):100527. doi: 10.1016/j.xgen.2024.100527. Epub 2024 Mar 26.
6
High quality de novo genome assembly of the non-conventional yeast Kazachstania bulderi describes a potential low pH production host for biorefineries.高质量从头组装非常规酵母 Kazachstania bulderi 的基因组,为生物精炼厂描述了一种潜在的低 pH 值生产宿主。
Commun Biol. 2023 Sep 7;6(1):918. doi: 10.1038/s42003-023-05285-0.
7
From accurate genome sequence to biotechnological application: The thermophile Mycolicibacterium hassiacum as experimental model.从准确的基因组序列到生物技术应用:嗜热分枝杆菌作为实验模型。
Microb Biotechnol. 2024 Jan;17(1):e14290. doi: 10.1111/1751-7915.14290. Epub 2023 Jul 27.
8
From contigs towards chromosomes: automatic improvement of long read assemblies (ILRA).从重叠群到染色体:长读序列组装的自动改进(ILRA)。
Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad248.
9
A revamped rat reference genome improves the discovery of genetic diversity in laboratory rats.经过改进的大鼠参考基因组有助于发现实验大鼠的遗传多样性。
bioRxiv. 2023 Sep 28:2023.04.13.536694. doi: 10.1101/2023.04.13.536694.
10
Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data.从宏基因组数据中恢复宏基因组组装基因组的原理和方法简介。
Microbiologyopen. 2022 Jun;11(3):e1298. doi: 10.1002/mbo3.1298.
Gigascience. 2020 Feb 1;9(2). doi: 10.1093/gigascience/giaa008.
4
FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.FLAS:用于 PacBio 长读长自我纠错的快速高通量算法。
Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.
5
Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads.串联基因型:从长 DNA 读取中稳健检测串联重复扩展。
Genome Biol. 2019 Mar 19;20(1):58. doi: 10.1186/s13059-019-1667-6.
6
Common workflow language (CWL)-based software pipeline for de novo genome assembly from long- and short-read data.基于通用工作流程语言 (CWL) 的从头开始组装长读长和短读数据的软件流水线。
Gigascience. 2019 Apr 1;8(4). doi: 10.1093/gigascience/giz014.
7
A comparative evaluation of hybrid error correction methods for error-prone long reads.对易错长读进行混合纠错方法的比较评估。
Genome Biol. 2019 Feb 4;20(1):26. doi: 10.1186/s13059-018-1605-z.
8
Complete Genome Sequence of DSM 44199.DSM 44199的全基因组序列
Microbiol Resour Announc. 2019 Jan 24;8(4). doi: 10.1128/MRA.01522-18. eCollection 2019 Jan.
9
Errors in long-read assemblies can critically affect protein prediction.长读长组装中的错误会严重影响蛋白质预测。
Nat Biotechnol. 2019 Feb;37(2):124-126. doi: 10.1038/s41587-018-0004-z.
10
Into the Thermus Mobilome: Presence, Diversity and Recent Activities of Insertion Sequences Across Thermus spp.深入嗜热栖热菌可移动基因组:嗜热栖热菌属中插入序列的存在、多样性及近期活性
Microorganisms. 2019 Jan 21;7(1):25. doi: 10.3390/microorganisms7010025.