BMC Bioinformatics. 2013;14 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-14-S11-S3. Epub 2013 Sep 13.
High-throughput sequencing (HTS) technologies are spearheading the accelerated development of biomedical research. Processing and summarizing the large amount of data generated by HTS presents a non-trivial challenge to bioinformatics. A commonly adopted standard is to store sequencing reads aligned to a reference genome in SAM (Sequence Alignment/Map) or BAM (Binary Alignment/Map) files. Quality control of SAM/BAM files is a critical checkpoint before downstream analysis. The goal of the current project is to facilitate and standardize this process.
We developed bamchop, a robust program to efficiently summarize key statistical metrics of HTS data stored in BAM files, and to visually present the results in a formatted report. The report documents information about various aspects of HTS data, such as sequencing quality, mapping to a reference genome, sequencing coverage, and base frequency. Bamchop uses the R language and Bioconductor packages to calculate statistical matrices and the Sweave utility and associated LaTeX markup for documentation. Bamchop's efficiency and robustness were tested on BAM files generated by local sequencing facilities and the 1000 Genomes Project. Source code, instruction and example reports of bamchop are freely available from https://github.com/CBMi-BiG/bamchop.
Bamchop enables biomedical researchers to quickly and rigorously evaluate HTS data by providing a convenient synopsis and user-friendly reports.
高通量测序(HTS)技术正在推动生物医学研究的快速发展。处理和总结 HTS 产生的大量数据对生物信息学提出了不小的挑战。一种常用的标准是将比对到参考基因组的测序reads 存储在 SAM(序列比对/地图)或 BAM(二进制比对/地图)文件中。在进行下游分析之前,SAM/BAM 文件的质量控制是一个关键的检查点。当前项目的目标是促进和规范这一过程。
我们开发了 bamchop,这是一个强大的程序,可以有效地总结存储在 BAM 文件中的 HTS 数据的关键统计指标,并以格式化报告的形式直观地呈现结果。该报告记录了有关 HTS 数据的各种方面的信息,例如测序质量、比对到参考基因组、测序覆盖度和碱基频率。bamchop 使用 R 语言和 Bioconductor 包来计算统计矩阵,使用 Sweave 实用程序和相关的 LaTeX 标记来记录文档。bamchop 的效率和稳健性在本地测序设施和 1000 基因组计划生成的 BAM 文件上进行了测试。bamchop 的源代码、说明和示例报告可从 https://github.com/CBMi-BiG/bamchop 免费获得。
bamchop 通过提供方便的概述和用户友好的报告,使生物医学研究人员能够快速、严格地评估 HTS 数据。