Chen Guo-Bo, Lee Sang Hong, Montgomery Grant W, Wray Naomi R, Visscher Peter M, Gearry Richard B, Lawrance Ian C, Andrews Jane M, Bampton Peter, Mahy Gillian, Bell Sally, Walsh Alissa, Connor Susan, Sparrow Miles, Bowdler Lisa M, Simms Lisa A, Krishnaprasad Krupa, Radford-Smith Graham L, Moser Gerhard
Queensland Brain Institute, The University of Queensland, Brisbane, Australia.
School of Environmental and Rural Science, The University of New England, Armidale, Australia.
BMC Med Genet. 2017 Aug 29;18(1):94. doi: 10.1186/s12881-017-0451-2.
Predicting risk of disease from genotypes is being increasingly proposed for a variety of diagnostic and prognostic purposes. Genome-wide association studies (GWAS) have identified a large number of genome-wide significant susceptibility loci for Crohn's disease (CD) and ulcerative colitis (UC), two subtypes of inflammatory bowel disease (IBD). Recent studies have demonstrated that including only loci that are significantly associated with disease in the prediction model has low predictive power and that power can substantially be improved using a polygenic approach.
We performed a comprehensive analysis of risk prediction models using large case-control cohorts genotyped for 909,763 GWAS SNPs or 123,437 SNPs on the custom designed Immunochip using four prediction methods (polygenic score, best linear genomic prediction, elastic-net regularization and a Bayesian mixture model). We used the area under the curve (AUC) to assess prediction performance for discovery populations with different sample sizes and number of SNPs within cross-validation.
On average, the Bayesian mixture approach had the best prediction performance. Using cross-validation we found little differences in prediction performance between GWAS and Immunochip, despite the GWAS array providing a 10 times larger effective genome-wide coverage. The prediction performance using Immunochip is largely due to the power of the initial GWAS for its marker selection and its low cost that enabled larger sample sizes. The predictive ability of the genomic risk score based on Immunochip was replicated in external data, with AUC of 0.75 for CD and 0.70 for UC. CD patients with higher risk scores demonstrated clinical characteristics typically associated with a more severe disease course including ileal location and earlier age at diagnosis.
Our analyses demonstrate that the power of genomic risk prediction for IBD is mainly due to strongly associated SNPs with considerable effect sizes. Additional SNPs that are only tagged by high-density GWAS arrays and low or rare-variants over-represented in the high-density region on the Immunochip contribute little to prediction accuracy. Although a quantitative assessment of IBD risk for an individual is not currently possible, we show sufficient power of genomic risk scores to stratify IBD risk among individuals at diagnosis.
出于各种诊断和预后目的,越来越多地提出从基因型预测疾病风险。全基因组关联研究(GWAS)已经确定了大量全基因组显著的炎症性肠病(IBD)的两种亚型——克罗恩病(CD)和溃疡性结肠炎(UC)的易感性位点。最近的研究表明,在预测模型中仅纳入与疾病显著相关的位点预测能力较低,而使用多基因方法可以显著提高预测能力。
我们使用四种预测方法(多基因评分、最佳线性基因组预测、弹性网正则化和贝叶斯混合模型),对在定制设计的免疫芯片上进行了909,763个GWAS单核苷酸多态性(SNP)或123,437个SNP基因分型的大型病例对照队列的风险预测模型进行了全面分析。我们使用曲线下面积(AUC)来评估不同样本量和交叉验证内SNP数量的发现人群的预测性能。
平均而言,贝叶斯混合方法具有最佳的预测性能。使用交叉验证,我们发现在GWAS和免疫芯片之间的预测性能几乎没有差异,尽管GWAS阵列提供了10倍更大的全基因组有效覆盖范围。使用免疫芯片的预测性能很大程度上归因于最初GWAS在标记选择方面的能力及其低成本,从而能够使用更大的样本量。基于免疫芯片的基因组风险评分的预测能力在外部数据中得到了验证,CD的AUC为0.75,UC的AUC为0.70。风险评分较高的CD患者表现出通常与更严重病程相关的临床特征,包括回肠部位和诊断时年龄较小。
我们的分析表明,IBD的基因组风险预测能力主要归因于具有相当大效应大小的强相关SNP。仅由高密度GWAS阵列标记的其他SNP以及在免疫芯片高密度区域中过度代表的低或罕见变异对预测准确性贡献很小。虽然目前还无法对个体的IBD风险进行定量评估,但我们表明基因组风险评分有足够的能力在诊断时对个体之间的IBD风险进行分层。