Suppr超能文献

SeqFu:一套用于对序列文件进行稳健且可重复操作的实用工具。

SeqFu: A Suite of Utilities for the Robust and Reproducible Manipulation of Sequence Files.

作者信息

Telatin Andrea, Fariselli Piero, Birolo Giovanni

机构信息

Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich NR4 7UQ, UK.

Department of Medical Sciences, University of Turin, 10126 Torino, Italy.

出版信息

Bioengineering (Basel). 2021 May 7;8(5):59. doi: 10.3390/bioengineering8050059.

Abstract

Sequence files formats (FASTA and FASTQ) are commonly used in bioinformatics, molecular biology and biochemistry. With the advent of next-generation sequencing (NGS) technologies, the number of FASTQ datasets produced and analyzed has grown exponentially, urging the development of dedicated software to handle, parse, and manipulate such files efficiently. Several bioinformatics packages are available to filter and manipulate FASTA and FASTQ files, yet some essential tasks remain poorly supported, leaving gaps that any workflow analysis of NGS datasets must fill with custom scripts. This can introduce harmful variability and performance bottlenecks in pivotal steps. Here we present a suite of tools, called SeqFu (Sequence Fastx utilities), that provides a broad range of commands to perform both common and specialist operations with ease and is designed to be easily implemented in high-performance analytical pipelines. SeqFu includes high-performance implementation of algorithms to interleave and deinterleave FASTQ files, merge Illumina lanes, and perform various quality controls (identification of degenerate primers, analysis of length statistics, extraction of portions of the datasets). SeqFu dereplicates sequences from multiple files keeping track of their provenance. SeqFu is developed in Nim for high-performance processing, is freely available, and can be installed with the popular package manager Miniconda.

摘要

序列文件格式(FASTA和FASTQ)在生物信息学、分子生物学和生物化学中被广泛使用。随着下一代测序(NGS)技术的出现,生成和分析的FASTQ数据集数量呈指数级增长,这促使人们开发专门的软件来有效地处理、解析和操作此类文件。有几个生物信息学软件包可用于过滤和操作FASTA和FASTQ文件,但一些基本任务仍然缺乏良好的支持,这使得NGS数据集的任何工作流程分析都必须用自定义脚本填补空白。这可能会在关键步骤中引入有害的变异性和性能瓶颈。在这里,我们展示了一套名为SeqFu(序列Fastx实用工具)的工具,它提供了广泛的命令,可轻松执行常见和专业操作,并设计为易于在高性能分析管道中实现。SeqFu包括用于交错和反交错FASTQ文件、合并Illumina泳道以及执行各种质量控制(识别简并引物、分析长度统计、提取数据集部分)的算法的高性能实现。SeqFu对来自多个文件的序列进行去重,并跟踪其来源。SeqFu是用Nim开发的,用于高性能处理,可免费获得,并可通过流行的包管理器Miniconda进行安装。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a032/8148589/7fad5e2bcfa6/bioengineering-08-00059-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验