• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多变量全基因组关联研究模型可提高对克罗恩病风险的预测,并鉴定潜在的新变异。

Multivariate genome-wide association study models to improve prediction of Crohn's disease risk and identification of potential novel variants.

机构信息

Tecnologico de Monterrey, Escuela de Medicina, Cátedra de Bioinformática, Av. Morones Prieto No. 3000, Colonia Los Doctores, Monterrey Nuevo León, 64710, Mexico.

Graduate Professional Studies, Brandeis University, Waltham, 02453, MA, USA.

出版信息

Comput Biol Med. 2022 Jun;145:105398. doi: 10.1016/j.compbiomed.2022.105398. Epub 2022 Mar 12.

DOI:10.1016/j.compbiomed.2022.105398
PMID:35306380
Abstract

BACKGROUND

Crohn's disease (CD) is a type of inflammatory bowel disease (IBD) that affects the gastrointestinal tract with diverse symptoms. At present, genome-wide association studies (GWAS) has discovered more than 140 genetic loci associated with CD from several datasets. Using the usual univariate GWAS methods, researchers have discovered common variants with small effects. Univariate methods assume independence among the variants that miss subtle combinatorial signals. Multivariate approaches have improved risk prediction and have complemented univariate methods for elucidating the etiology of complex traits and potential novel associations. However, the current multivariate models for CD have been assessed for three datasets (published from 2006 to 2008) under unrelated methodological settings showing a broad performance spectrum. Notably, these multivariate studies do not analyze potential novel variants. Here, we aimed to perform a robust multivariate analysis of a CD dataset different from the one commonly used, and we used the information yielded by the models to identify whether the generated models could provide additional information about the potential novel variants of CD.

METHODS

Therefore, we compared different multivariate methods and models, LASSO (least absolute shrinkage and selection operator), XGBoost, random forest (RF), Bootstrap stage-wise model selection (BSWiMS), and LDpred, using a strict random subsampling approach to predict the CD risk using a recent GWAS dataset, United Kingdom IBD IBD Genetics Consortium (UKIBDGC), made available in 2017, that had not been used for CD prediction studies. In addition, we assessed the effect of common strategies by increasing and decreasing the number of single-nucleotide polymorphism (SNP) markers (using genotype imputation and linkage disequilibrium (LD)-clumping).

RESULTS

We found that the LDpred model without any imputation was the best model among all the tested models for predicting the CD risk (area under the receiver operating characteristic curve (AUROC) = 0.667 ± 0.024) in this dataset. We validated the best models using a second dataset (National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) IBD Genetics Consortium, which was previously used in CD prediction studies) in which LDpred was also the best method with a similar performance (AUROC = 0.634 ± 0.009). Based on the importance of the variants yielded by the multivariate models, we identified an unnoticed region within chromosome 6, tagged by SNP rs4945943; this region was close to the gene MARCKS, which appeared to contribute to CD risk.

CONCLUSIONS

This research is the first multivariate prediction analysis applied to the UKIBDGC dataset. Our robust multivariate setting analysis enabled us to identify a potential variant that contributed to the CD risk. Multivariate methods are valuable tools for identifying genes that contribute to disease risk.

摘要

背景

克罗恩病(CD)是一种影响胃肠道的炎症性肠病(IBD),具有多种症状。目前,全基因组关联研究(GWAS)已经从多个数据集发现了 140 多个与 CD 相关的遗传位点。使用常用的单变量 GWAS 方法,研究人员发现了具有小效应的常见变体。单变量方法假设变体之间相互独立,从而错过了细微的组合信号。多变量方法提高了风险预测能力,并补充了单变量方法,以阐明复杂特征的病因和潜在的新关联。然而,目前针对 CD 的多变量模型已经在三个数据集(2006 年至 2008 年发表)下进行了评估,这些数据集的方法设置不同,表现出广泛的性能范围。值得注意的是,这些多变量研究并未分析潜在的新变体。在这里,我们旨在对一个不同于常用数据集的 CD 数据集进行稳健的多变量分析,并使用模型产生的信息来确定生成的模型是否可以提供有关 CD 潜在新变体的附加信息。

方法

因此,我们使用严格的随机子采样方法比较了不同的多变量方法和模型,包括 LASSO(最小绝对收缩和选择算子)、XGBoost、随机森林(RF)、Bootstrap 逐步模型选择(BSWiMS)和 LDpred,以使用最近的 GWAS 数据集(2017 年提供的英国 IBD IBD 遗传学联合会(UKIBDGC))预测 CD 风险,该数据集之前未用于 CD 预测研究。此外,我们通过增加和减少单核苷酸多态性(SNP)标记的数量(使用基因型推断和连锁不平衡(LD)聚类)来评估常见策略的效果。

结果

我们发现,在这个数据集,不进行任何推断的 LDpred 模型是所有测试模型中预测 CD 风险的最佳模型(接受者操作特征曲线下的面积(AUROC)= 0.667 ± 0.024)。我们使用第二个数据集(以前用于 CD 预测研究的国家糖尿病、消化和肾脏疾病研究所(NIDDK)IBD 遗传学联合会)验证了最佳模型,LDpred 也是性能相似的最佳方法(AUROC = 0.634 ± 0.009)。基于多变量模型产生的变体的重要性,我们在 6 号染色体上发现了一个被标记为 SNP rs4945943 的未被注意到的区域;该区域靠近 MARCKS 基因,该基因似乎与 CD 风险有关。

结论

这是首次将多变量预测分析应用于 UKIBDGC 数据集。我们稳健的多变量设置分析使我们能够识别出一个潜在的变体,该变体有助于 CD 风险。多变量方法是识别疾病风险相关基因的有价值的工具。

相似文献

1
Multivariate genome-wide association study models to improve prediction of Crohn's disease risk and identification of potential novel variants.多变量全基因组关联研究模型可提高对克罗恩病风险的预测,并鉴定潜在的新变异。
Comput Biol Med. 2022 Jun;145:105398. doi: 10.1016/j.compbiomed.2022.105398. Epub 2022 Mar 12.
2
Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method.基于基因分型平台和基因组风险评分方法的炎症性肠病风险预测性能
BMC Med Genet. 2017 Aug 29;18(1):94. doi: 10.1186/s12881-017-0451-2.
3
Association between variants of PRDM1 and NDP52 and Crohn's disease, based on exome sequencing and functional studies.基于外显子组测序和功能研究的 PRDM1 和 NDP52 变异与克罗恩病的关联。
Gastroenterology. 2013 Aug;145(2):339-47. doi: 10.1053/j.gastro.2013.04.040. Epub 2013 Apr 25.
4
Genome-wide association study of Crohn's disease in Koreans revealed three new susceptibility loci and common attributes of genetic susceptibility across ethnic populations.韩国人克罗恩病的全基因组关联研究揭示了三个新的易感基因座,以及不同种族人群遗传易感性的共同特征。
Gut. 2014 Jan;63(1):80-7. doi: 10.1136/gutjnl-2013-305193. Epub 2013 Jul 14.
5
Independent replication of an association of CNVR7113.6 with Crohn's disease in Caucasians.在高加索人群中,CNVR7113.6 与克罗恩病的关联得到了独立复制。
Inflamm Bowel Dis. 2012 Feb;18(2):305-11. doi: 10.1002/ibd.21752. Epub 2011 May 10.
6
Strategies for developing prediction models from genome-wide association studies.从全基因组关联研究中开发预测模型的策略。
Genet Epidemiol. 2013 Dec;37(8):768-77. doi: 10.1002/gepi.21762. Epub 2013 Oct 25.
7
Bayesian analysis of genome-wide inflammatory bowel disease data sets reveals new risk loci.贝叶斯分析全基因组炎症性肠病数据集揭示新的风险位点。
Eur J Hum Genet. 2018 Feb;26(2):265-274. doi: 10.1038/s41431-017-0041-y. Epub 2017 Dec 4.
8
Regulatory Variants on the Leukocyte Immunoglobulin-Like Receptor Gene Cluster are Associated with Crohn's Disease and Interact with Regulatory Variants for TAP2.白细胞免疫球蛋白样受体基因簇上的调控变异与克罗恩病相关,并与 TAP2 的调控变异相互作用。
J Crohns Colitis. 2024 Jan 27;18(1):47-53. doi: 10.1093/ecco-jcc/jjad127.
9
Identification of risk loci for Crohn's disease phenotypes using a genome-wide association study.利用全基因组关联研究鉴定克罗恩病表型的风险基因座。
Gastroenterology. 2015 Apr;148(4):794-805. doi: 10.1053/j.gastro.2014.12.030. Epub 2014 Dec 31.
10
Integrating disease and drug-related phenotypes for improved identification of pharmacogenomic variants.整合疾病和药物相关表型以提高药物基因组变异体的识别能力。
Pharmacogenomics. 2021 Apr;22(5):251-261. doi: 10.2217/pgs-2020-0130. Epub 2021 Mar 26.

引用本文的文献

1
Genetic Artificial Intelligence in Gastrointestinal Disease: A Systematic Review.胃肠道疾病中的遗传人工智能:系统评价
Diagnostics (Basel). 2025 Sep 2;15(17):2227. doi: 10.3390/diagnostics15172227.
2
Persistent Activation of the P2X7 Receptor Underlies Chronic Inflammation and Carcinogenic Changes in the Intestine.P2X7 受体的持续激活是肠道慢性炎症和癌变的基础。
Int J Mol Sci. 2024 Oct 10;25(20):10874. doi: 10.3390/ijms252010874.
3
Salivary Th17 cytokine, human β-defensin 1-3, and salivary scavenger and agglutinin levels in Crohn's disease.
克罗恩病患者唾液中Th17细胞因子、人β-防御素1-3以及唾液清除剂和凝集素水平
Clin Oral Investig. 2024 Jan 22;28(1):108. doi: 10.1007/s00784-024-05509-5.
4
Applying logistic LASSO regression for the diagnosis of atypical Crohn's disease.应用逻辑斯谛 LASSO 回归分析诊断不典型克罗恩病。
Sci Rep. 2022 Jul 5;12(1):11340. doi: 10.1038/s41598-022-15609-5.