AlignGraph2：用于 PacBio 长读长的相似基因组辅助重组装流程。

AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.

机构信息

Group of Interdisciplinary Information Sciences, School of Software Engineering, Beijing Jiaotong University, China.

College of Information and Computer Engineering, Northeast Forestry University, China.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab022.

DOI:10.1093/bib/bbab022

PMID:33621981

Abstract

Contigs assembled from the third-generation sequencing long reads are usually more complete than the second-generation short reads. However, the current algorithms still have difficulty in assembling the long reads into the ideal complete and accurate genome, or the theoretical best result [1]. To improve the long read contigs and with more and more fully sequenced genomes available, it could still be possible to use the similar genome-assisted reassembly method [2], which was initially proposed for the short reads making use of a closely related genome (similar genome) to the sequencing genome (target genome). The method aligns the contigs and reads to the similar genome, and then extends and refines the aligned contigs with the aligned reads. Here, we introduce AlignGraph2, a similar genome-assisted reassembly pipeline for the PacBio long reads. The AlignGraph2 pipeline is the second version of AlignGraph algorithm proposed by us but completely redesigned, can be inputted with either error-prone or HiFi long reads, and contains four novel algorithms: similarity-aware alignment algorithm and alignment filtration algorithm for alignment of the long reads and preassembled contigs to the similar genome, and reassembly algorithm and weight-adjusted consensus algorithm for extension and refinement of the preassembled contigs. In our performance tests on both error-prone and HiFi long reads, AlignGraph2 can align 5.7-27.2% more long reads and 7.3-56.0% more bases than some current alignment algorithm and is more efficient or comparable to the others. For contigs assembled with various de novo algorithms and aligned to similar genomes (aligned contigs), AlignGraph2 can extend 8.7-94.7% of them (extendable contigs), and obtain contigs of 7.0-249.6% larger N50 value and 5.2-87.7% smaller number of indels per 100 kbp (extended contigs). With genomes of decreased similarities, AlignGraph2 also has relatively stable performance. The AlignGraph2 software can be downloaded for free from this site: https://github.com/huangs001/AlignGraph2.

摘要

从第三代测序长读长组装的 contigs 通常比第二代短读长更完整。然而，当前的算法仍然难以将长读长组装成理想的完整和准确的基因组，或者理论上的最佳结果[1]。为了提高长读长 contigs 的质量，并且随着越来越多的全基因组序列可用，仍然有可能使用类似的基因组辅助重新组装方法[2]，该方法最初是针对利用与测序基因组（目标基因组）密切相关的基因组（相似基因组）的短读长提出的。该方法将 contigs 和读长与相似基因组进行比对，然后使用比对的读长扩展和细化对齐的 contigs。在这里，我们介绍了用于 PacBio 长读长的类似基因组辅助重新组装管道 AlignGraph2。AlignGraph2 管道是我们提出的 AlignGraph 算法的第二个版本，但完全重新设计，可以输入易错或 HiFi 长读长，并包含四个新算法：用于将长读长和预组装 contigs 与相似基因组进行比对的相似性感知比对算法和比对过滤算法，以及用于扩展和细化预组装 contigs 的重新组装算法和加权一致算法。在我们对易错和 HiFi 长读长的性能测试中，AlignGraph2 可以比对更多的长读长，比一些当前的比对算法多 5.7-27.2%，比对更多的碱基，而且比其他算法更高效或相当。对于用各种从头组装算法组装并与相似基因组比对的 contigs（对齐 contigs），AlignGraph2 可以扩展它们的 8.7-94.7%（可扩展 contigs），并获得更大的 N50 值和更小的插入缺失数（每 100 kbp 5.2-87.7%）的 contigs。随着相似基因组的减少，AlignGraph2 也具有相对稳定的性能。AlignGraph2 软件可以从以下网址免费下载：https://github.com/huangs001/AlignGraph2。

相似文献

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab022.

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references.

Bioinformatics. 2014 Jun 15;30(12):i319-i328. doi: 10.1093/bioinformatics/btu291.

HALC: High throughput algorithm for long read error correction.

BMC Bioinformatics. 2017 Apr 5;18(1):204. doi: 10.1186/s12859-017-1610-3.

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Bioinformatics. 2018 Jan 1;34(1):24-32. doi: 10.1093/bioinformatics/btx524.

SLR: a scaffolding algorithm based on long reads and contig classification.

BMC Bioinformatics. 2019 Oct 30;20(1):539. doi: 10.1186/s12859-019-3114-9.

Unicycler: Resolving bacterial genome assemblies from short and long sequencing reads.

PLoS Comput Biol. 2017 Jun 8;13(6):e1005595. doi: 10.1371/journal.pcbi.1005595. eCollection 2017 Jun.

Evaluation and Validation of Assembling Corrected PacBio Long Reads for Microbial Genome Completion via Hybrid Approaches.

PLoS One. 2015 Dec 7;10(12):e0144305. doi: 10.1371/journal.pone.0144305. eCollection 2015.

FLAS: fast and high-throughput algorithm for PacBio long-read self-correction.

Bioinformatics. 2019 Oct 15;35(20):3953-3960. doi: 10.1093/bioinformatics/btz206.

Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression.

Bioinformatics. 2019 Jun 1;35(12):2066-2074. doi: 10.1093/bioinformatics/bty936.

GMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments.

Bioinformatics. 2015 Dec 1;31(23):3733-41. doi: 10.1093/bioinformatics/btv465. Epub 2015 Aug 10.

引用本文的文献

Draft genome of the aardaker (Lathyrus tuberosus L.), a tuberous legume.

BMC Genom Data. 2022 Sep 4;23(1):70. doi: 10.1186/s12863-022-01083-5.

Immunoglobulin Classification Based on FC* and GC* Features.

Front Genet. 2022 Jan 24;12:827161. doi: 10.3389/fgene.2021.827161. eCollection 2021.

Application of Sparse Representation in Bioinformatics.

Front Genet. 2021 Dec 15;12:810875. doi: 10.3389/fgene.2021.810875. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

AlignGraph2：用于 PacBio 长读长的相似基因组辅助重组装流程。

AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.

机构信息

Group of Interdisciplinary Information Sciences, School of Software Engineering, Beijing Jiaotong University, China.

College of Information and Computer Engineering, Northeast Forestry University, China.

出版信息

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab022.

DOI:10.1093/bib/bbab022

PMID:33621981

Abstract

摘要

AlignGraph2：用于 PacBio 长读长的相似基因组辅助重组装流程。

AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

AlignGraph2：用于 PacBio 长读长的相似基因组辅助重组装流程。

AlignGraph2: similar genome-assisted reassembly pipeline for PacBio long reads.

机构信息

出版信息

相似文献

引用本文的文献