Suppr超能文献

用于大规模新一代测序(Illumina)数据并行、自动化和快速质量控制分析的NGS-QCbox与树莓派

NGS-QCbox and Raspberry for Parallel, Automated and Rapid Quality Control Analysis of Large-Scale Next Generation Sequencing (Illumina) Data.

作者信息

Katta Mohan A V S K, Khan Aamir W, Doddamani Dadakhalandar, Thudi Mahendar, Varshney Rajeev K

机构信息

International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India.

International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India; School of Plant Biology and Institute of Agriculture, The University of Western Australia, Crawley, Australia.

出版信息

PLoS One. 2015 Oct 13;10(10):e0139868. doi: 10.1371/journal.pone.0139868. eCollection 2015.

Abstract

Rapid popularity and adaptation of next generation sequencing (NGS) approaches have generated huge volumes of data. High throughput platforms like Illumina HiSeq produce terabytes of raw data that requires quick processing. Quality control of the data is an important component prior to the downstream analyses. To address these issues, we have developed a quality control pipeline, NGS-QCbox that scales up to process hundreds or thousands of samples. Raspberry is an in-house tool, developed in C language utilizing HTSlib (v1.2.1) (http://htslib.org), for computing read/base level statistics. It can be used as stand-alone application and can process both compressed and uncompressed FASTQ format files. NGS-QCbox integrates Raspberry with other open-source tools for alignment (Bowtie2), SNP calling (SAMtools) and other utilities (bedtools) towards analyzing raw NGS data at higher efficiency and in high-throughput manner. The pipeline implements batch processing of jobs using Bpipe (https://github.com/ssadedin/bpipe) in parallel and internally, a fine grained task parallelization utilizing OpenMP. It reports read and base statistics along with genome coverage and variants in a user friendly format. The pipeline developed presents a simple menu driven interface and can be used in either quick or complete mode. In addition, the pipeline in quick mode outperforms in speed against other similar existing QC pipeline/tools. The NGS-QCbox pipeline, Raspberry tool and associated scripts are made available at the URL https://github.com/CEG-ICRISAT/NGS-QCbox and https://github.com/CEG-ICRISAT/Raspberry for rapid quality control analysis of large-scale next generation sequencing (Illumina) data.

摘要

新一代测序(NGS)方法的迅速普及和应用产生了大量数据。像Illumina HiSeq这样的高通量平台会产生数TB的原始数据,需要快速处理。在进行下游分析之前,数据质量控制是一个重要组成部分。为了解决这些问题,我们开发了一个质量控制流程NGS-QCbox,它能够扩展以处理数百或数千个样本。Raspberry是一个内部工具,用C语言开发,利用HTSlib(v1.2.1)(http://htslib.org)来计算读取/碱基水平的统计数据。它可以作为独立应用程序使用,并且可以处理压缩和未压缩的FASTQ格式文件。NGS-QCbox将Raspberry与其他开源工具集成,用于比对(Bowtie2)、单核苷酸多态性(SNP)检测(SAMtools)和其他实用工具(bedtools),以便更高效、高通量地分析原始NGS数据。该流程使用Bpipe(https://github.com/ssadedin/bpipe)并行实现作业的批处理,并且在内部利用OpenMP进行细粒度任务并行化。它以用户友好格式报告读取和碱基统计数据以及基因组覆盖度和变异情况。所开发的流程提供简单的菜单驱动界面,可以在快速模式或完整模式下使用。此外,快速模式下的流程在速度上优于其他类似的现有质量控制流程/工具。NGS-QCbox流程、Raspberry工具及相关脚本可在URL https://github.com/CEG-ICRISAT/NGS-QCbox和https://github.com/CEG-ICRISAT/Raspberry上获取,用于对大规模新一代测序(Illumina)数据进行快速质量控制分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc22/4604202/71e356b84ace/pone.0139868.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验