Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.
Am J Hum Genet. 2013 Jun 6;92(6):1008-12. doi: 10.1016/j.ajhg.2013.05.002. Epub 2013 May 23.
We performed risk assessment for Crohn's disease (CD) and ulcerative colitis (UC), the two common forms of inflammatory bowel disease (IBD), by using data from the International IBD Genetics Consortium's Immunochip project. This data set contains ~17,000 CD cases, ~13,000 UC cases, and ~22,000 controls from 15 European countries typed on the Immunochip. This custom chip provides a more comprehensive catalog of the most promising candidate variants by picking up the remaining common variants and certain rare variants that were missed in the first generation of GWAS. Given this unprecedented large sample size and wide variant spectrum, we employed the most recent machine-learning techniques to build optimal predictive models. Our final predictive models achieved areas under the curve (AUCs) of 0.86 and 0.83 for CD and UC, respectively, in an independent evaluation. To our knowledge, this is the best prediction performance ever reported for CD and UC to date.
我们通过使用来自国际炎症性肠病遗传学联合会免疫芯片项目的数据,对两种常见的炎症性肠病(IBD)——克罗恩病(CD)和溃疡性结肠炎(UC)进行了风险评估。该数据集包含来自 15 个欧洲国家的约 17000 例 CD 病例、约 13000 例 UC 病例和约 22000 例对照,这些病例在免疫芯片上进行了分型。这种定制芯片通过获取第一代 GWAS 中遗漏的剩余常见变体和某些罕见变体,提供了更全面的最有前途的候选变体目录。鉴于这个前所未有的大样本量和广泛的变异谱,我们采用了最新的机器学习技术来构建最佳的预测模型。在独立评估中,我们最终的预测模型分别为 CD 和 UC 实现了 0.86 和 0.83 的曲线下面积(AUC)。据我们所知,这是迄今为止 CD 和 UC 报告的最佳预测性能。