SAS Institute, Cary, NC 27513, USA.
Pharmacogenomics J. 2010 Aug;10(4):336-46. doi: 10.1038/tpj.2010.36.
The Affymetrix GeneChip Human Mapping 500K array is common for genome-wide association studies (GWASs). Recent findings highlight the importance of accurate genotype calling algorithms to reduce the inflation in Type I and Type II error rates. Differential results due to genotype calling errors can introduce severe bias in case-control association study results. Using data from the Wellcome Trust Case Control Consortium, 1991 individuals with coronary artery disease (CAD) and 1500 controls from the UK Blood Services (NBS) were genotyped on the Affymetrix 500K array. Different batch sizes and compositions were used in the Bayesian Robust Linear Model with Mahalanobis distance classifier (BRLMM) genotype calling algorithm to assess the batch effect on downstream association analysis. Results show that composition (cases and controls genotyped simultaneously or separate) and size (number of individuals processed by BRLMM at a time) can create 2-3% discordance in the results for quality control and statistical analysis and may contribute to the lack of reproducibility between GWASs. The changes in batch size are largely responsible for differential single-nucleotide polymorphism results, yet we observe evidence of an interactive effect of batch size and composition that contributes to discordant results in the list of significantly associated loci.
Affymetrix GeneChip Human Mapping 500K 阵列常用于全基因组关联研究 (GWAS)。最近的研究结果强调了准确的基因型调用算法的重要性,以降低 I 型和 II 型错误率的膨胀。由于基因型调用错误而导致的差异结果可能会在病例对照关联研究结果中引入严重的偏差。利用来自英国 Wellcome Trust 病例对照联合会的数据,对 1991 名患有冠状动脉疾病 (CAD) 的个体和 1500 名来自英国血液服务中心 (NBS) 的对照者进行了 Affymetrix 500K 阵列的基因分型。在贝叶斯稳健线性模型与马氏距离分类器 (BRLMM) 基因型调用算法中使用了不同的批次大小和组成,以评估批次效应对下游关联分析的影响。结果表明,组成(病例和对照同时或分别进行基因分型)和大小(BRLMM 一次处理的个体数量)可能导致质量控制和统计分析结果出现 2-3%的差异,并可能导致 GWAS 之间缺乏可重复性。批次大小的变化在很大程度上导致了单核苷酸多态性结果的差异,但我们观察到批次大小和组成之间存在交互效应的证据,这导致了显著相关基因座列表中不一致的结果。