Genomics Group, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, FDA, Silver Spring, MD 20903, USA.
Pharmacogenomics J. 2010 Aug;10(4):347-54. doi: 10.1038/tpj.2010.27.
The robustness of genome-wide association study (GWAS) results depends on the genotyping algorithms used to establish the association. This paper initiated the assessment of the impact of the Corrected Robust Linear Model with Maximum Likelihood Classification (CRLMM) genotyping quality on identifying real significant genes in a GWAS with large sample sizes. With microarray image data from the Wellcome Trust Case-Control Consortium (WTCCC), 1991 individuals with coronary artery disease (CAD) and 1500 controls, genetic associations were evaluated under various batch sizes and compositions. Experimental designs included different batch sizes of 250, 350, 500, 2000 samples with different distributions of cases and controls in each batch with either randomized or simply combined (4:3 case-control ratios) or separate case-control samples as well as whole 3491 samples. The separate composition could create 2-3% discordance in the single nucleotide polymorphism (SNP) results for quality control/statistical analysis and might contribute to the lack of reproducibility between GWAS. CRLMM shows high genotyping accuracy and stability to batch effects. According to the genotypic and allelic tests (P<5.0 x 10(-7)), nine significant signals on chromosome 9 were found consistently in all batch sizes with combined design. Our findings are critical to optimize the reproducibility of GWAS and confirm the genetic role in the pathophysiology of CAD.
全基因组关联研究(GWAS)结果的稳健性取决于用于建立关联的基因分型算法。本文首次评估了Corrected Robust Linear Model with Maximum Likelihood Classification(CRLMM)基因分型质量对大样本 GWAS 中识别真实显著基因的影响。使用来自 Wellcome Trust Case-Control Consortium(WTCCC)的微阵列图像数据,对 1991 名冠心病(CAD)患者和 1500 名对照者进行了遗传关联评估,在不同的批次大小和组成下进行了评估。实验设计包括不同批次大小为 250、350、500 和 2000 个样本,每个批次中病例和对照的分布不同,包括随机或简单组合(4:3 病例对照比)或单独的病例对照样本,以及全部 3491 个样本。单独的组成可能会导致用于质量控制/统计分析的单核苷酸多态性(SNP)结果产生 2-3%的不一致性,并且可能是 GWAS 之间缺乏可重复性的原因。CRLMM 显示出对批次效应具有较高的基因分型准确性和稳定性。根据基因型和等位基因测试(P<5.0 x 10(-7)),在组合设计的所有批次大小中,一致发现了 9 个位于 9 号染色体上的显著信号。我们的研究结果对于优化 GWAS 的可重复性以及确认 CAD 病理生理学中的遗传作用至关重要。