基于全外显子测序数据的监督机器学习方法对炎症性肠病患者进行亚型分类。

Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.

机构信息

Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK.

NIHR Southampton Biomedical Research, University Hospital Southampton, Southampton, UK.

出版信息

J Crohns Colitis. 2023 Nov 8;17(10):1672-1680. doi: 10.1093/ecco-jcc/jjad084.

DOI:10.1093/ecco-jcc/jjad084

PMID:37205778

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10637043/

Abstract

BACKGROUND

Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype.

METHODS

Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset.

RESULTS

A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC.

DISCUSSION

We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.

摘要

背景

炎症性肠病（IBD）是一种慢性炎症性疾病，有两个主要亚型：克罗恩病（CD）和溃疡性结肠炎（UC）。及时诊断亚型有助于进行正确的治疗。我们使用基因组数据评估机器学习（ML），根据 IBD 亚型对患者进行分类。

方法

使用内部生物信息学管道处理儿科/成人 IBD 患者的全外显子组测序（WES）数据。这些数据被浓缩为每个基因、每个个体的基因组负担评分 GenePy。数据分为训练和测试数据集[80/20]。使用线性支持向量分类器进行特征选择，并使用贝叶斯优化进行超参数调整[训练数据]。使用随机森林监督 ML 方法，使用三个面板将患者分类为 CD 或 UC：1）所有可用基因；2）自身免疫基因；3）“IBD”基因。使用测试数据集评估 ML 结果的Receiver Operating Characteristics 曲线下面积（AUROC）、敏感性和特异性。

结果

共有 906 名患者纳入分析[600 名 CD，306 名 UC]。训练数据包括 488 名患者，根据 UC 的少数类平衡。自身免疫基因面板生成的 ML 模型表现最佳[AUROC=0.68]，优于 IBD 基因面板[AUROC=0.61]。无论使用哪个基因面板，NOD2 都是区分 CD 和 UC 的最佳基因。CD 患者 GenePy 评分高的基因变异缺乏是 UC 诊断的最佳分类器。

讨论

我们使用随机森林和 WES 数据证明了对患者进行亚型分类的有前景的方法。通过关注特定的患者亚组，并使用更大的数据集，可能会导致更好的分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40dc/10637043/c6f8de175d18/jjad084_fig1.jpg

相似文献

Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.基于全外显子测序数据的监督机器学习方法对炎症性肠病患者进行亚型分类。

J Crohns Colitis. 2023 Nov 8;17(10):1672-1680. doi: 10.1093/ecco-jcc/jjad084.

Accurate Classification of Pediatric Colonic Inflammatory Bowel Disease Subtype Using a Random Forest Machine Learning Classifier.使用随机森林机器学习分类器对小儿结肠炎性肠病亚型进行准确分类。

J Pediatr Gastroenterol Nutr. 2021 Feb 1;72(2):262-269. doi: 10.1097/MPG.0000000000002956.

Gut microbiome-based supervised machine learning for clinical diagnosis of inflammatory bowel diseases.基于肠道微生物组的监督机器学习用于炎症性肠病的临床诊断。

Am J Physiol Gastrointest Liver Physiol. 2021 Mar 1;320(3):G328-G337. doi: 10.1152/ajpgi.00360.2020. Epub 2021 Jan 13.

Performance of Machine Learning Algorithms for Predicting Disease Activity in Inflammatory Bowel Disease.机器学习算法在预测炎症性肠病疾病活动中的性能。

Inflammation. 2023 Aug;46(4):1561-1574. doi: 10.1007/s10753-023-01827-0. Epub 2023 May 12.

Integrated analysis of multiple microarray studies to establish differential diagnostic models of Crohn's disease and ulcerative colitis based on a metalloproteinase-associated module.基于金属蛋白酶相关模块的多个微阵列研究的综合分析，建立克罗恩病和溃疡性结肠炎的鉴别诊断模型。

Front Immunol. 2022 Nov 21;13:1022850. doi: 10.3389/fimmu.2022.1022850. eCollection 2022.

Development and Validation of Diagnostic Criteria for IBD Subtypes Including IBD-unclassified in Children: a Multicentre Study From the Pediatric IBD Porto Group of ESPGHAN.儿童炎症性肠病（IBD）亚型包括未分类IBD的诊断标准的制定与验证：欧洲儿科胃肠病、肝病和营养学会（ESPGHAN）儿科IBD波尔图组的多中心研究

J Crohns Colitis. 2017 Sep 1;11(9):1078-1084. doi: 10.1093/ecco-jcc/jjx053.

Clinical application of the multigene analysis test in discriminating between ulcerative colitis and Crohn's disease: a retrospective study.多基因分析检测在鉴别溃疡性结肠炎和克罗恩病中的临床应用：一项回顾性研究

Scand J Gastroenterol. 2012 Feb;47(2):162-9. doi: 10.3109/00365521.2011.647065. Epub 2012 Jan 10.

A probabilistic pathway score (PROPS) for classification with applications to inflammatory bowel disease.用于分类的概率途径评分（PROPS）及其在炎症性肠病中的应用。

Bioinformatics. 2018 Mar 15;34(6):985-993. doi: 10.1093/bioinformatics/btx651.

Screening of hub inflammatory bowel disease biomarkers and identification of immune-related functions based on basement membrane genes.基于基底膜基因的炎症性肠病枢纽标志物筛选及免疫相关功能鉴定。

Eur J Med Res. 2023 Jul 22;28(1):247. doi: 10.1186/s40001-023-01193-5.

Differentiating ulcerative colitis from Crohn disease in children and young adults: report of a working group of the North American Society for Pediatric Gastroenterology, Hepatology, and Nutrition and the Crohn's and Colitis Foundation of America.儿童及青年溃疡性结肠炎与克罗恩病的鉴别：北美儿科胃肠病学、肝病学和营养学会及美国克罗恩病和结肠炎基金会工作组报告

J Pediatr Gastroenterol Nutr. 2007 May;44(5):653-74. doi: 10.1097/MPG.0b013e31805563f3.

引用本文的文献

Identifying inflammatory bowel disease subtypes: a comprehensive exploration of transcriptomic data and machine learning-based approaches.识别炎症性肠病亚型：对转录组数据和基于机器学习方法的全面探索

Therap Adv Gastroenterol. 2025 Aug 12;18:17562848251362391. doi: 10.1177/17562848251362391. eCollection 2025.

Machine learning in the differential diagnosis of ulcerative colitis and Crohn's disease: a systematic review.机器学习在溃疡性结肠炎和克罗恩病鉴别诊断中的应用：一项系统综述

Transl Gastroenterol Hepatol. 2025 Jul 7;10:56. doi: 10.21037/tgh-24-117. eCollection 2025.

Artificial intelligence use for precision medicine in inflammatory bowel disease: a systematic review.人工智能在炎症性肠病精准医学中的应用：一项系统综述。

Am J Transl Res. 2025 Jan 15;17(1):28-46. doi: 10.62347/XILL3707. eCollection 2025.

Comprehensive clinical phenotype, genotype and therapy in Yao syndrome.姚综合征的全面临床表型、基因型和治疗。

Front Immunol. 2024 Sep 20;15:1458118. doi: 10.3389/fimmu.2024.1458118. eCollection 2024.

Inflammatory bowel disease genomics, transcriptomics, proteomics and metagenomics meet artificial intelligence.炎症性肠病基因组学、转录组学、蛋白质组学和宏基因组学与人工智能相遇。

United European Gastroenterol J. 2024 Dec;12(10):1461-1480. doi: 10.1002/ueg2.12655. Epub 2024 Aug 31.

NOD2 and Crohn's Disease Clinical Practice: From Epidemiology to Diagnosis and Therapy, Rewired.NOD2与克罗恩病临床实践：从流行病学到诊断与治疗，重新布线。

Inflamm Bowel Dis. 2025 Feb 6;31(2):552-562. doi: 10.1093/ibd/izae075.

本文引用的文献

Prediction of Crohn's Disease Stricturing Phenotype Using a NOD2-derived Genomic Biomarker.利用 NOD2 衍生的基因组生物标志物预测克罗恩病狭窄表型。

Inflamm Bowel Dis. 2023 Apr 3;29(4):511-521. doi: 10.1093/ibd/izac205.

NOD2 in Crohn's Disease-Unfinished Business.NOD2 在克罗恩病中的作用——尚未完成的课题。

J Crohns Colitis. 2023 Apr 3;17(3):450-458. doi: 10.1093/ecco-jcc/jjac124.

A Systematic Review of Artificial Intelligence and Machine Learning Applications to Inflammatory Bowel Disease, with Practical Guidelines for Interpretation.人工智能和机器学习在炎症性肠病中的应用的系统评价，以及解释的实用指南。

Inflamm Bowel Dis. 2022 Oct 3;28(10):1573-1583. doi: 10.1093/ibd/izac115.

An Integrated Taxonomy for Monogenic Inflammatory Bowel Disease.单基因炎症性肠病的综合分类学。

Gastroenterology. 2022 Mar;162(3):859-876. doi: 10.1053/j.gastro.2021.11.014. Epub 2021 Nov 13.

Mutation spectrum of NOD2 reveals recessive inheritance as a main driver of Early Onset Crohn's Disease.NOD2 突变谱揭示隐性遗传是早发性克罗恩病的主要驱动因素。

Sci Rep. 2021 Mar 10;11(1):5595. doi: 10.1038/s41598-021-84938-8.

CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores.使用深度学习衍生的剪接分数提高 CADD-Splice 全基因组变异效应预测。

Genome Med. 2021 Feb 22;13(1):31. doi: 10.1186/s13073-021-00835-9.

An interpretable low-complexity machine learning framework for robust exome-based - diagnosis of Crohn's disease patients.一种用于基于外显子组的克罗恩病患者稳健诊断的可解释低复杂度机器学习框架。

NAR Genom Bioinform. 2020 Feb 21;2(1):lqaa011. doi: 10.1093/nargab/lqaa011. eCollection 2020 Mar.

Ileal Transcriptomic Analysis in Paediatric Crohn's Disease Reveals IL17- and NOD-signalling Expression Signatures in Treatment-naïve Patients and Identifies Epithelial Cells Driving Differentially Expressed Genes.小儿克罗恩病的回肠转录组分析显示，未经治疗的患者存在 IL17 和 NOD 信号表达特征，并鉴定了驱动差异表达基因的上皮细胞。

J Crohns Colitis. 2021 May 4;15(5):774-786. doi: 10.1093/ecco-jcc/jjaa236.

Genetic Sequencing of Pediatric Patients Identifies Mutations in Monogenic Inflammatory Bowel Disease Genes that Translate to Distinct Clinical Phenotypes.对儿科患者的基因测序发现单基因炎性肠病基因中的突变，这些突变转化为不同的临床表型。

Clin Transl Gastroenterol. 2020 Feb;11(2):e00129. doi: 10.14309/ctg.0000000000000129.

The mutational constraint spectrum quantified from variation in 141,456 humans.从 141456 名人类个体的变异中量化的突变约束谱。

Nature. 2020 May;581(7809):434-443. doi: 10.1038/s41586-020-2308-7. Epub 2020 May 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于全外显子测序数据的监督机器学习方法对炎症性肠病患者进行亚型分类。

Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

DISCUSSION

背景

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献