Gao Lei, Wu Cong, Liu Lin
The Key Laboratory of Plant Epigenetics of Guangdong Province, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, P. R. China.
J Bioinform Comput Biol. 2019 Dec;17(6):1950037. doi: 10.1142/S0219720019500379.
There are many short-read aligners that can map short reads to a reference genome/sequence, and most of them can directly accept a FASTQ file as the input query file. However, the raw data usually need to be pre-processed. Few software programs specialize in pre-processing raw data generated by a variety of next-generation sequencing (NGS) technologies. Here, we present AUSPP, a Perl script-based pipeline for pre-processing and automatic mapping of NGS short reads. This pipeline encompasses quality control, adaptor trimming, collapsing of reads, structural RNA removal, length selection, read mapping, and normalized wiggle file creation. It facilitates the processing from raw data to genome mapping and is therefore a powerful tool for the steps before meta-analysis. Most importantly, since AUSPP has default processing pipeline settings for many types of NGS data, most of the time, users will simply need to provide the raw data and genome. AUSPP is portable and easy to install, and the source codes are freely available at https://github.com/highlei/AUSPP.
有许多短读长比对工具可以将短读长映射到参考基因组/序列,并且它们中的大多数都可以直接接受FASTQ文件作为输入查询文件。然而,原始数据通常需要进行预处理。很少有软件程序专门用于预处理由各种下一代测序(NGS)技术生成的原始数据。在这里,我们展示了AUSPP,这是一个基于Perl脚本的流程,用于对NGS短读长进行预处理和自动映射。该流程包括质量控制、接头修剪、读长折叠、结构RNA去除、长度选择、读长映射以及标准化wiggle文件创建。它便于从原始数据处理到基因组映射,因此是荟萃分析之前步骤的强大工具。最重要的是,由于AUSPP对许多类型的NGS数据具有默认的处理流程设置,大多数时候,用户只需提供原始数据和基因组即可。AUSPP可移植且易于安装,其源代码可在https://github.com/highlei/AUSPP上免费获取。
J Bioinform Comput Biol. 2019-12
Source Code Biol Med. 2014-5-3
BMC Bioinformatics. 2015-3-11
BMC Bioinformatics. 2014
BMC Bioinformatics. 2016-2-4
BMC Syst Biol. 2013
Bioinformatics. 2012-9-15
Plant Physiol. 2022-1-20