O'Halloran Damien M
Institute for Neuroscience, The George Washington University, 636 Ross Hall, 2300 I St. N.W., Washington, DC, 20052, USA.
Department of Biological Sciences, The George Washington University, 636 Ross Hall, 2300 I St. N.W., Washington, DC, 20052, USA.
BMC Res Notes. 2017 Jul 12;10(1):275. doi: 10.1186/s13104-017-2616-7.
Next generation sequencing datasets are stored as FASTQ formatted files. In order to avoid downstream artefacts, it is critical to implement a robust preprocessing protocol of the FASTQ sequence in order to determine the integrity and quality of the data.
Here I describe fastQ_brew which is a package that provides a suite of methods to evaluate sequence data in FASTQ format and efficiently implements a variety of manipulations to filter sequence data by size, quality and/or sequence. fastQ_brew allows for mismatch searches to adapter sequences, left and right end trimming, removal of duplicate reads, as well as reads containing non-designated bases. fastQ_brew also returns summary statistics on the unfiltered and filtered FASTQ data, and offers FASTQ to FASTA conversion as well as FASTQ reverse complement and DNA to RNA manipulations.
fastQ_brew is open source and freely available to all users at the following webpage: https://github.com/dohalloran/fastQ_brew .
下一代测序数据集以FASTQ格式文件存储。为避免下游出现假象,实施强大的FASTQ序列预处理协议以确定数据的完整性和质量至关重要。
在此我描述了fastQ_brew,它是一个软件包,提供了一套评估FASTQ格式序列数据的方法,并有效地实施了各种操作,以按大小、质量和/或序列过滤序列数据。fastQ_brew允许对衔接子序列进行错配搜索、左右端修剪、去除重复读段以及去除包含非指定碱基的读段。fastQ_brew还返回未过滤和已过滤FASTQ数据的汇总统计信息,并提供FASTQ到FASTA的转换以及FASTQ反向互补和DNA到RNA的操作。
fastQ_brew是开源的,所有用户均可在以下网页免费获取:https://github.com/dohalloran/fastQ_brew 。