Suppr超能文献

LRez:一个用于分析和管理链接读取数据的C++应用程序编程接口和工具包。

LRez: a C++ API and toolkit for analyzing and managing Linked-Reads data.

作者信息

Morisse Pierre, Lemaitre Claire, Legeai Fabrice

机构信息

Univ Rennes, Inria, CNRS, IRISA, Rennes 35000, France.

IGEPP, INRAE, Institut Agro, Univ Rennes, Rennes 35000, France.

出版信息

Bioinform Adv. 2021 Sep 25;1(1):vbab022. doi: 10.1093/bioadv/vbab022. eCollection 2021.

Abstract

MOTIVATION

Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist.

RESULTS

We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances.

AVAILABILITY AND IMPLEMENTATION

LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module.

SUPPLEMENTARY INFORMATION

Supplementary data are available at online.

摘要

动机

连接读段技术通过使用标记来自共同长DNA分子的读段的条形码,将短读段测序的高质量和低成本与长程信息结合起来。该技术已应用于广泛的领域,包括基因组组装、定相和支架搭建,以及结构变异检测。然而,迄今为止,还没有专门用于处理连接读段数据的工具或应用程序编程接口(API)。

结果

我们推出了LRez,这是一个C++ API和工具包,可轻松管理连接读段数据。LRez具有多种功能,可计算基因组区域之间的共同条形码数量,从BAM文件中提取条形码,以及对BAM、FASTQ和压缩的FASTQ文件进行索引和查询,以快速获取所有包含给定条形码的读段或比对。LRez与多种连接读段测序技术兼容,因此可用于任何需要条形码处理或索引的工具或流程中,以提高其性能。

可用性和实现

LRez用C++实现,在基于Unix的平台上受支持,可在https://github.com/morispi/LRez上以AGPL-3.0许可获得,也可作为生物conda模块获得。

补充信息

补充数据可在网上获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/aabe/9710615/49e39564a4b0/vbab022f1.jpg

相似文献

1
LRez: a C++ API and toolkit for analyzing and managing Linked-Reads data.
Bioinform Adv. 2021 Sep 25;1(1):vbab022. doi: 10.1093/bioadv/vbab022. eCollection 2021.
2
VGEA: an RNA viral assembly toolkit.
PeerJ. 2021 Sep 6;9:e12129. doi: 10.7717/peerj.12129. eCollection 2021.
3
: fast and scalable deconvolution of linked-read sequencing data.
Bioinform Adv. 2022 Sep 26;2(1):vbac068. doi: 10.1093/bioadv/vbac068. eCollection 2022.
4
SVIM: structural variant identification using mapped long reads.
Bioinformatics. 2019 Sep 1;35(17):2907-2915. doi: 10.1093/bioinformatics/btz041.
5
ARBitR: an overlap-aware genome assembly scaffolder for linked reads.
Bioinformatics. 2021 Aug 9;37(15):2203-2205. doi: 10.1093/bioinformatics/btaa975.
6
BamSnap: a lightweight viewer for sequencing reads in BAM files.
Bioinformatics. 2021 Apr 19;37(2):263-264. doi: 10.1093/bioinformatics/btaa1101.
7
BamTools: a C++ API and toolkit for analyzing and managing BAM files.
Bioinformatics. 2011 Jun 15;27(12):1691-2. doi: 10.1093/bioinformatics/btr174. Epub 2011 Apr 14.
8
Hybrid correction of highly noisy long reads using a variable-order de Bruijn graph.
Bioinformatics. 2018 Dec 15;34(24):4213-4222. doi: 10.1093/bioinformatics/bty521.
9
Aquila_stLFR: diploid genome assembly based structural variant calling package for stLFR linked-reads.
Bioinform Adv. 2021 Jun 16;1(1):vbab007. doi: 10.1093/bioadv/vbab007. eCollection 2021.
10
ModPhred: an integrative toolkit for the analysis and storage of nanopore sequencing DNA and RNA modification data.
Bioinformatics. 2021 Dec 22;38(1):257-260. doi: 10.1093/bioinformatics/btab539.

引用本文的文献

1
MTG-Link: leveraging barcode information from linked-reads to assemble specific loci.
BMC Bioinformatics. 2023 Jul 14;24(1):284. doi: 10.1186/s12859-023-05395-w.

本文引用的文献

1
Haplotype tagging reveals parallel formation of hybrid races in two butterfly species.
Proc Natl Acad Sci U S A. 2021 Jun 22;118(25). doi: 10.1073/pnas.2015005118.
2
Mutation load at a mimicry supergene sheds new light on the evolution of inversion polymorphisms.
Nat Genet. 2021 Mar;53(3):288-293. doi: 10.1038/s41588-020-00771-1. Epub 2021 Jan 25.
5
ARCS: scaffolding genome drafts with linked reads.
Bioinformatics. 2018 Mar 1;34(5):725-731. doi: 10.1093/bioinformatics/btx675.
6
Genome-wide reconstruction of complex structural variants using read clouds.
Nat Methods. 2017 Sep;14(9):915-920. doi: 10.1038/nmeth.4366. Epub 2017 Jul 17.
7
A hybrid approach for de novo human genome sequence assembly and phasing.
Nat Methods. 2016 Jul;13(7):587-90. doi: 10.1038/nmeth.3865. Epub 2016 May 9.
8
Haplotyping germline and cancer genomes with high-throughput linked-read sequencing.
Nat Biotechnol. 2016 Mar;34(3):303-11. doi: 10.1038/nbt.3432. Epub 2016 Feb 1.
9
BamTools: a C++ API and toolkit for analyzing and managing BAM files.
Bioinformatics. 2011 Jun 15;27(12):1691-2. doi: 10.1093/bioinformatics/btr174. Epub 2011 Apr 14.
10
The Sequence Alignment/Map format and SAMtools.
Bioinformatics. 2009 Aug 15;25(16):2078-9. doi: 10.1093/bioinformatics/btp352. Epub 2009 Jun 8.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验