Suppr超能文献

Ultraplex:一款快速、灵活的一体化快速q测序数据解复用工具。

Ultraplex: A rapid, flexible, all-in-one fastq demultiplexer.

作者信息

Wilkins Oscar G, Capitanchik Charlotte, Luscombe Nicholas M, Ule Jernej

机构信息

The Francis Crick Institute, London, UK.

Department of Neuromuscular Diseases, UCL Queen Square Institute of Neurology, London, UK.

出版信息

Wellcome Open Res. 2021 Jun 7;6:141. doi: 10.12688/wellcomeopenres.16791.1. eCollection 2021.

Abstract

The first step of virtually all next generation sequencing analysis involves the splitting of the raw sequencing data into separate files using sample-specific barcodes, a process known as "demultiplexing". However, we found that existing software for this purpose was either too inflexible or too computationally intensive for fast, streamlined processing of raw, single end fastq files containing combinatorial barcodes. Here, we introduce a fast and uniquely flexible demultiplexer, named Ultraplex, which splits a raw FASTQ file containing barcodes either at a single end or at both 5' and 3' ends of reads, trims the sequencing adaptors and low-quality bases, and moves unique molecular identifiers (UMIs) into the read header, allowing subsequent removal of PCR duplicates. Ultraplex is able to perform such single or combinatorial demultiplexing on both single- and paired-end sequencing data, and can process an entire Illumina HiSeq lane, consisting of nearly 500 million reads, in less than 20 minutes. Ultraplex greatly reduces computational burden and pipeline complexity for the demultiplexing of complex sequencing libraries, such as those produced by various CLIP and ribosome profiling protocols, and is also very user friendly, enabling streamlined, robust data processing. Ultraplex is available on PyPi and Conda and via Github.

摘要

几乎所有下一代测序分析的第一步都涉及使用样本特异性条形码将原始测序数据拆分为单独的文件,这一过程称为“解复用”。然而,我们发现现有的用于此目的的软件要么过于僵化,要么计算量太大,无法对包含组合条形码的原始单端fastq文件进行快速、简化的处理。在这里,我们介绍一种快速且独特灵活的解复用器,名为Ultraplex,它可以在读取的单端或5'和3'两端拆分包含条形码的原始FASTQ文件,修剪测序接头和低质量碱基,并将独特分子标识符(UMI)移至读取头中,以便后续去除PCR重复序列。Ultraplex能够对单端和双端测序数据进行这种单重或多重解复用,并且可以在不到20分钟的时间内处理由近5亿条读取组成的整个Illumina HiSeq泳道。Ultraplex大大降低了复杂测序文库解复用的计算负担和流程复杂性,例如由各种CLIP和核糖体分析协议产生的文库,并且非常用户友好,能够实现简化、可靠的数据处理。Ultraplex可在PyPi、Conda上获取,并可通过Github获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7601/8287537/a96665e375ba/wellcomeopenres-6-18522-g0000.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验