Morisse Pierre, Lemaitre Claire, Legeai Fabrice
Univ Rennes, Inria, CNRS, IRISA, Rennes 35000, France.
IGEPP, INRAE, Institut Agro, Univ Rennes, Rennes 35000, France.
Bioinform Adv. 2021 Sep 25;1(1):vbab022. doi: 10.1093/bioadv/vbab022. eCollection 2021.
Linked-Reads technologies combine both the high quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist.
We introduce LRez, a C++ API and toolkit that allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances.
LRez is implemented in C++, supported on Unix-based platforms and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module.
Supplementary data are available at online.
连接读段技术通过使用标记来自共同长DNA分子的读段的条形码,将短读段测序的高质量和低成本与长程信息结合起来。该技术已应用于广泛的领域,包括基因组组装、定相和支架搭建,以及结构变异检测。然而,迄今为止,还没有专门用于处理连接读段数据的工具或应用程序编程接口(API)。
我们推出了LRez,这是一个C++ API和工具包,可轻松管理连接读段数据。LRez具有多种功能,可计算基因组区域之间的共同条形码数量,从BAM文件中提取条形码,以及对BAM、FASTQ和压缩的FASTQ文件进行索引和查询,以快速获取所有包含给定条形码的读段或比对。LRez与多种连接读段测序技术兼容,因此可用于任何需要条形码处理或索引的工具或流程中,以提高其性能。
LRez用C++实现,在基于Unix的平台上受支持,可在https://github.com/morispi/LRez上以AGPL-3.0许可获得,也可作为生物conda模块获得。
补充数据可在网上获取。