CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China.
Genomics Proteomics Bioinformatics. 2011 Dec;9(6):238-44. doi: 10.1016/S1672-0229(11)60027-2.
The emergence of next-generation sequencing (NGS) technologies has significantly improved sequencing throughput and reduced costs. However, the short read length, duplicate reads and massive volume of data make the data processing much more difficult and complicated than the first-generation sequencing technology. Although there are some software packages developed to assess the data quality, those packages either are not easily available to users or require bioinformatics skills and computer resources. Moreover, almost all the quality assessment software currently available didn't taken into account the sequencing errors when dealing with the duplicate assessment in NGS data. Here, we present a new user-friendly quality assessment software package called BIGpre, which works for both Illumina and 454 platforms. BIGpre contains all the functions of other quality assessment software, such as the correlation between forward and reverse reads, read GC-content distribution, and base Ns quality. More importantly, BIGpre incorporates associated programs to detect and remove duplicate reads after taking sequencing errors into account and trimming low quality reads from raw data as well. BIGpre is primarily written in Perl and integrates graphical capability from the statistics package R. This package produces both tabular and graphical summaries of data quality for sequencing datasets from Illumina and 454 platforms. Processing hundreds of millions reads within minutes, this package provides immediate diagnostic information for user to manipulate sequencing data for downstream analyses. BIGpre is freely available at http://bigpre.sourceforge.net.
下一代测序(NGS)技术的出现极大地提高了测序通量,降低了成本。然而,短读长、重复读取和海量数据使得数据处理比第一代测序技术更加困难和复杂。虽然已经开发了一些软件包来评估数据质量,但这些软件包要么不容易被用户使用,要么需要生物信息学技能和计算机资源。此外,几乎所有现有的质量评估软件在处理 NGS 数据中的重复评估时都没有考虑测序错误。在这里,我们提出了一个新的用户友好的质量评估软件包,称为 BIGpre,它适用于 Illumina 和 454 平台。BIGpre 包含其他质量评估软件的所有功能,如正向和反向读取之间的相关性、读取 GC 含量分布和碱基 N 质量。更重要的是,BIGpre 结合了相关程序,在考虑测序错误并从原始数据中修剪低质量读取后,用于检测和去除重复读取。BIGpre 主要用 Perl 编写,并集成了统计软件包 R 的图形功能。该软件包为来自 Illumina 和 454 平台的测序数据集生成数据质量的表格和图形摘要。在数分钟内处理数亿个读取,该软件包为用户提供了用于下游分析的测序数据的即时诊断信息。BIGpre 可在 http://bigpre.sourceforge.net 免费获得。