使用全基因组基因分型数据对克罗恩病患者进行分类的机器学习方法的比较性能。

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data.

机构信息

Centre de recherche sur l'inflammation UMR 1149, Inserm - Université Paris Diderot, 75018, Paris, France.

Data Team, Département d'informatique de l'ENS, École normale supérieure, CNRS, PSL Research University, 75005, Paris, France.

出版信息

Sci Rep. 2019 Jul 17;9(1):10351. doi: 10.1038/s41598-019-46649-z.

DOI:10.1038/s41598-019-46649-z

PMID:31316157

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6637191/

Abstract

Crohn Disease (CD) is a complex genetic disorder for which more than 140 genes have been identified using genome wide association studies (GWAS). However, the genetic architecture of the trait remains largely unknown. The recent development of machine learning (ML) approaches incited us to apply them to classify healthy and diseased people according to their genomic information. The Immunochip dataset containing 18,227 CD patients and 34,050 healthy controls enrolled and genotyped by the international Inflammatory Bowel Disease genetic consortium (IIBDGC) has been re-analyzed using a set of ML methods: penalized logistic regression (LR), gradient boosted trees (GBT) and artificial neural networks (NN). The main score used to compare the methods was the Area Under the ROC Curve (AUC) statistics. The impact of quality control (QC), imputing and coding methods on LR results showed that QC methods and imputation of missing genotypes may artificially increase the scores. At the opposite, neither the patient/control ratio nor marker preselection or coding strategies significantly affected the results. LR methods, including Lasso, Ridge and ElasticNet provided similar results with a maximum AUC of 0.80. GBT methods like XGBoost, LightGBM and CatBoost, together with dense NN with one or more hidden layers, provided similar AUC values, suggesting limited epistatic effects in the genetic architecture of the trait. ML methods detected near all the genetic variants previously identified by GWAS among the best predictors plus additional predictors with lower effects. The robustness and complementarity of the different methods are also studied. Compared to LR, non-linear models such as GBT or NN may provide robust complementary approaches to identify and classify genetic markers.

摘要

克罗恩病（CD）是一种复杂的遗传疾病，通过全基因组关联研究（GWAS）已经确定了超过 140 个基因。然而，该特征的遗传结构在很大程度上仍然未知。最近机器学习（ML）方法的发展促使我们应用这些方法根据他们的基因组信息来对健康人和患者进行分类。包含 18227 名 CD 患者和 34050 名健康对照的 Immunochip 数据集，由国际炎症性肠病遗传联盟（IIBDGC）招募和基因分型，使用一组 ML 方法重新进行了分析：惩罚逻辑回归（LR）、梯度提升树（GBT）和人工神经网络（NN）。用于比较方法的主要得分是 ROC 曲线下的面积（AUC）统计数据。质量控制（QC）、缺失基因型的插补和编码方法对 LR 结果的影响表明，QC 方法和缺失基因型的插补可能会人为地增加分数。相反，患者/对照比例、标记物预选或编码策略都不会显著影响结果。LR 方法，包括 Lasso、Ridge 和 ElasticNet，提供了相似的结果，最大 AUC 为 0.80。像 XGBoost、LightGBM 和 CatBoost 这样的 GBT 方法，以及具有一个或多个隐藏层的密集 NN，提供了相似的 AUC 值，这表明在该特征的遗传结构中，上位效应有限。ML 方法检测到了先前 GWAS 确定的所有遗传变异中的近一半，以及其他具有较低效应的预测因子。还研究了不同方法的稳健性和互补性。与 LR 相比，非线性模型（如 GBT 或 NN）可能提供稳健的互补方法来识别和分类遗传标记。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4063/6637191/47247b63cf85/41598_2019_46649_Fig1_HTML.jpg

相似文献

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data.使用全基因组基因分型数据对克罗恩病患者进行分类的机器学习方法的比较性能。

Sci Rep. 2019 Jul 17;9(1):10351. doi: 10.1038/s41598-019-46649-z.

Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method.基于基因分型平台和基因组风险评分方法的炎症性肠病风险预测性能

BMC Med Genet. 2017 Aug 29;18(1):94. doi: 10.1186/s12881-017-0451-2.

Multivariate genome-wide association study models to improve prediction of Crohn's disease risk and identification of potential novel variants.多变量全基因组关联研究模型可提高对克罗恩病风险的预测，并鉴定潜在的新变异。

Comput Biol Med. 2022 Jun;145:105398. doi: 10.1016/j.compbiomed.2022.105398. Epub 2022 Mar 12.

Genome-wide association study of Crohn's disease in Koreans revealed three new susceptibility loci and common attributes of genetic susceptibility across ethnic populations.韩国人克罗恩病的全基因组关联研究揭示了三个新的易感基因座，以及不同种族人群遗传易感性的共同特征。

Gut. 2014 Jan;63(1):80-7. doi: 10.1136/gutjnl-2013-305193. Epub 2013 Jul 14.

Interpretable artificial neural networks incorporating Bayesian alphabet models for genome-wide prediction and association studies.基于贝叶斯字母模型的可解释人工神经网络在全基因组预测和关联研究中的应用。

G3 (Bethesda). 2021 Sep 27;11(10). doi: 10.1093/g3journal/jkab228.

Utilizing Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women.利用深度学习和全基因组关联研究对非裔美国妇女的由上位效应驱动的早产进行分类。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Mar-Apr;17(2):668-678. doi: 10.1109/TCBB.2018.2868667. Epub 2018 Sep 3.

Immunochip analysis identification of 6 additional susceptibility loci for Crohn's disease in Koreans.免疫芯片分析在韩国人中鉴定出另外6个克罗恩病易感基因座。

Inflamm Bowel Dis. 2015 Jan;21(1):1-7. doi: 10.1097/MIB.0000000000000268.

Multi-locus genetic risk score predicts risk for Crohn's disease in Slovenian population.多位点遗传风险评分可预测斯洛文尼亚人群患克罗恩病的风险。

World J Gastroenterol. 2016 Apr 14;22(14):3777-84. doi: 10.3748/wjg.v22.i14.3777.

Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease.大样本量、广泛的变异谱和先进的机器学习技术提高了炎症性肠病的风险预测能力。

Am J Hum Genet. 2013 Jun 6;92(6):1008-12. doi: 10.1016/j.ajhg.2013.05.002. Epub 2013 May 23.

Supervised Machine Learning Models for Predicting Sepsis-Associated Liver Injury in Patients With Sepsis: Development and Validation Study Based on a Multicenter Cohort Study.用于预测脓毒症患者脓毒症相关肝损伤的监督式机器学习模型：基于多中心队列研究的开发与验证研究

J Med Internet Res. 2025 May 26;27:e66733. doi: 10.2196/66733.

引用本文的文献

Digital biomarkers and artificial intelligence: a new frontier in personalized management of inflammatory bowel disease.数字生物标志物与人工智能：炎症性肠病个性化管理的新前沿。

Front Immunol. 2025 Aug 4;16:1637159. doi: 10.3389/fimmu.2025.1637159. eCollection 2025.

Machine learning in the differential diagnosis of ulcerative colitis and Crohn's disease: a systematic review.机器学习在溃疡性结肠炎和克罗恩病鉴别诊断中的应用：一项系统综述

Transl Gastroenterol Hepatol. 2025 Jul 7;10:56. doi: 10.21037/tgh-24-117. eCollection 2025.

MINE: maximally informative next experiment-toward a new GWAS experimental design and methodology.MINE：迈向新的全基因组关联研究实验设计与方法的最大信息性下一个实验

G3 (Bethesda). 2025 Sep 3;15(9). doi: 10.1093/g3journal/jkaf163.

Machine learning using genotype and gene-expression data identifies alterations of genes involved in infection susceptibility, antigen presentation and cytokine signalling as key contributors to JIA risk prediction.利用基因型和基因表达数据的机器学习识别出参与感染易感性、抗原呈递和细胞因子信号传导的基因改变，这些是幼年特发性关节炎风险预测的关键因素。

RMD Open. 2025 Jul 9;11(3):e005737. doi: 10.1136/rmdopen-2025-005737.

MINE: a new way to design genetics experiments for discovery.MINE：一种设计用于发现的遗传学实验的新方法。

Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf167.

Pathogenesis and precision medicine for predicting response in inflammatory bowel disease: advances and future directions.炎症性肠病中预测反应的发病机制与精准医学：进展与未来方向

eGastroenterology. 2024 Jan 18;2(1):e100006. doi: 10.1136/egastro-2023-100006. eCollection 2024 Jan.

Genome-wide Machine Learning Analysis of Anosmia and Ageusia with COVID-19.新冠病毒感染所致嗅觉丧失和味觉丧失的全基因组机器学习分析

medRxiv. 2024 Dec 5:2024.12.04.24318493. doi: 10.1101/2024.12.04.24318493.

The Evolution of Artificial Intelligence in Medical Imaging: From Computer Science to Machine and Deep Learning.医学成像中人工智能的发展：从计算机科学到机器学习与深度学习

Cancers (Basel). 2024 Nov 1;16(21):3702. doi: 10.3390/cancers16213702.

Computed tomography enterography radiomics and machine learning for identification of Crohn's disease.计算机断层扫描肠造影术放射组学和机器学习在克罗恩病识别中的应用。

BMC Med Imaging. 2024 Nov 6;24(1):302. doi: 10.1186/s12880-024-01480-5.

Inflammatory bowel disease genomics, transcriptomics, proteomics and metagenomics meet artificial intelligence.炎症性肠病基因组学、转录组学、蛋白质组学和宏基因组学与人工智能相遇。

United European Gastroenterol J. 2024 Dec;12(10):1461-1480. doi: 10.1002/ueg2.12655. Epub 2024 Aug 31.

本文引用的文献

Opportunities and obstacles for deep learning in biology and medicine.深度学习在生物学和医学中的机遇与挑战。

J R Soc Interface. 2018 Apr;15(141). doi: 10.1098/rsif.2017.0387.

BMC Med Genet. 2017 Aug 29;18(1):94. doi: 10.1186/s12881-017-0451-2.

Fine-mapping inflammatory bowel disease loci to single-variant resolution.将炎症性肠病基因座精细定位到单变体分辨率。

Nature. 2017 Jul 13;547(7662):173-178. doi: 10.1038/nature22969. Epub 2017 Jun 28.

Genetic Factors Interact With Tobacco Smoke to Modify Risk for Inflammatory Bowel Disease in Humans and Mice.遗传因素与烟草烟雾相互作用，改变人类和小鼠患炎症性肠病的风险。

Gastroenterology. 2017 Aug;153(2):550-565. doi: 10.1053/j.gastro.2017.05.010. Epub 2017 May 12.

Oncostatin M drives intestinal inflammation and predicts response to tumor necrosis factor-neutralizing therapy in patients with inflammatory bowel disease.制瘤素M可引发肠道炎症，并预测炎症性肠病患者对肿瘤坏死因子中和疗法的反应。

Nat Med. 2017 May;23(5):579-589. doi: 10.1038/nm.4307. Epub 2017 Apr 3.

Statistical analysis for genome-wide association study.全基因组关联研究的统计分析。

J Biomed Res. 2015 Jul;29(4):285-97. doi: 10.7555/JBR.29.20140007. Epub 2014 Nov 30.

Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations.关联分析确定了38个炎症性肠病的易感基因座，并突出了不同人群间共有的遗传风险。

Nat Genet. 2015 Sep;47(9):979-986. doi: 10.1038/ng.3359. Epub 2015 Jul 20.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Machine learning applications in genetics and genomics.机器学习在遗传学和基因组学中的应用。

Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920. Epub 2015 May 7.

Regularized machine learning in the genetic prediction of complex traits.复杂性状遗传预测中的正则化机器学习

PLoS Genet. 2014 Nov 13;10(11):e1004754. doi: 10.1371/journal.pgen.1004754. eCollection 2014 Nov.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用全基因组基因分型数据对克罗恩病患者进行分类的机器学习方法的比较性能。

Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献