Gondro Cedric, Lee Seung Hwan, Lee Hak Kyo, Porto-Neto Laercio R
The Centre for Genetic Analysis and Applications, University of New England, Armidale, NSW, Australia.
Methods Mol Biol. 2013;1019:129-47. doi: 10.1007/978-1-62703-447-0_5.
This chapter overviews the quality control (QC) issues for SNP-based genotyping methods used in genome-wide association studies. The main metrics for evaluating the quality of the genotypes are discussed followed by a worked out example of QC pipeline starting with raw data and finishing with a fully filtered dataset ready for downstream analysis. The emphasis is on automation of data storage, filtering, and manipulation to ensure data integrity throughput the process and on how to extract a global summary from these high dimensional datasets to allow better-informed downstream analytical decisions. All examples will be run using the R statistical programming language followed by a practical example using a fully automated QC pipeline for the Illumina platform.
本章概述了全基因组关联研究中基于单核苷酸多态性(SNP)的基因分型方法的质量控制(QC)问题。讨论了评估基因型质量的主要指标,随后给出了一个质量控制流程的实例,该流程从原始数据开始,以准备好用于下游分析的完全过滤数据集结束。重点在于数据存储、过滤和处理的自动化,以确保整个过程中的数据完整性,以及如何从这些高维数据集中提取全局摘要,以便做出更明智的下游分析决策。所有示例将使用R统计编程语言运行,随后是一个使用Illumina平台全自动质量控制流程的实际示例。