Suppr超能文献

QC-Chain:一种用于下一代测序数据的快速且全面的质量控制方法。

QC-Chain: fast and holistic quality control method for next-generation sequencing data.

机构信息

CAS Key Laboratory of Biofuels, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China.

出版信息

PLoS One. 2013;8(4):e60234. doi: 10.1371/journal.pone.0060234. Epub 2013 Apr 2.

Abstract

Next-generation sequencing (NGS) technologies have been widely used in life sciences. However, several kinds of sequencing artifacts, including low-quality reads and contaminating reads, were found to be quite common in raw sequencing data, which compromise downstream analysis. Therefore, quality control (QC) is essential for raw NGS data. However, although a few NGS data quality control tools are publicly available, there are two limitations: First, the processing speed could not cope with the rapid increase of large data volume. Second, with respect to removing the contaminating reads, none of them could identify contaminating sources de novo, and they rely heavily on prior information of the contaminating species, which is usually not available in advance. Here we report QC-Chain, a fast, accurate and holistic NGS data quality-control method. The tool synergeticly comprised of user-friendly tools for (1) quality assessment and trimming of raw reads using Parallel-QC, a fast read processing tool; (2) identification, quantification and filtration of unknown contamination to get high-quality clean reads. It was optimized based on parallel computation, so the processing speed is significantly higher than other QC methods. Experiments on simulated and real NGS data have shown that reads with low sequencing quality could be identified and filtered. Possible contaminating sources could be identified and quantified de novo, accurately and quickly. Comparison between raw reads and processed reads also showed that subsequent analyses (genome assembly, gene prediction, gene annotation, etc.) results based on processed reads improved significantly in completeness and accuracy. As regard to processing speed, QC-Chain achieves 7-8 time speed-up based on parallel computation as compared to traditional methods. Therefore, QC-Chain is a fast and useful quality control tool for read quality process and de novo contamination filtration of NGS reads, which could significantly facilitate downstream analysis. QC-Chain is publicly available at: http://www.computationalbioenergy.org/qc-chain.html.

摘要

下一代测序(NGS)技术已广泛应用于生命科学领域。然而,在原始测序数据中,经常会发现包括低质量读数和污染读数在内的多种测序伪影,这会影响下游分析。因此,对原始 NGS 数据进行质量控制(QC)至关重要。然而,尽管有一些公开的 NGS 数据质量控制工具,但它们存在两个局限性:首先,处理速度无法应对大量数据量的快速增长。其次,就去除污染读数而言,它们都无法从头识别污染来源,而是严重依赖于污染物种的先验信息,而这些信息通常是无法提前获得的。在这里,我们报告了 QC-Chain,这是一种快速、准确和全面的 NGS 数据质量控制方法。该工具协同使用了用户友好的工具,用于(1)使用 Parallel-QC 对原始读数进行质量评估和修剪,这是一种快速的读取处理工具;(2)识别、定量和过滤未知污染,以获得高质量的清洁读数。它是基于并行计算进行优化的,因此处理速度明显高于其他 QC 方法。在模拟和真实 NGS 数据上的实验表明,可以识别和过滤低测序质量的读数。可以准确快速地识别和定量新的污染来源。原始读数和处理后的读数之间的比较也表明,基于处理后的读数进行后续分析(基因组组装、基因预测、基因注释等)的结果在完整性和准确性方面有显著提高。在处理速度方面,QC-Chain 基于并行计算实现了与传统方法相比 7-8 倍的加速。因此,QC-Chain 是一种快速且有用的质量控制工具,可用于 NGS 读数的读取质量处理和从头污染过滤,可显著促进下游分析。QC-Chain 可在以下网址获得:http://www.computationalbioenergy.org/qc-chain.html。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4ce4/3615005/211ce99adb52/pone.0060234.g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验