Turner Stephen, Armstrong Loren L, Bradford Yuki, Carlson Christopher S, Crawford Dana C, Crenshaw Andrew T, de Andrade Mariza, Doheny Kimberly F, Haines Jonathan L, Hayes Geoffrey, Jarvik Gail, Jiang Lan, Kullo Iftikhar J, Li Rongling, Ling Hua, Manolio Teri A, Matsumoto Martha, McCarty Catherine A, McDavid Andrew N, Mirel Daniel B, Paschall Justin E, Pugh Elizabeth W, Rasmussen Luke V, Wilke Russell A, Zuvich Rebecca L, Ritchie Marylyn D
Center for Human Genetics Research, Department of Molecular Physiology & Biophysics, Vanderbilt University, Nashville, Tennessee, USA.
Curr Protoc Hum Genet. 2011 Jan;Chapter 1:Unit1.19. doi: 10.1002/0471142905.hg0119s68.
Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.
全基因组关联研究(GWAS)正在以史无前例的速度在基于人群的队列中开展,并且增进了我们对复杂疾病病理生理学的理解。无论在何种情况下,这些信息的实际效用最终都将取决于原始数据的质量。GWAS的质量控制(QC)程序计算量很大,操作上具有挑战性,并且在不断发展。在这里,我们列举了GWAS数据质量控制中的一些挑战,并描述了电子病历与基因组学(eMERGE)网络在GWAS数据质量保证中所采用的方法,从而将GWAS结果中的潜在偏差和误差降至最低。我们讨论了与GWAS数据质量控制相关的常见问题,包括数据文件格式、用于数据处理和分析的软件包、性染色体异常、样本身份、样本相关性、群体亚结构、批次效应和标记质量。我们提出了最佳实践,并讨论了当前和未来的研究领域。