Suppr超能文献

基于全外显子测序数据的监督机器学习方法对炎症性肠病患者进行亚型分类。

Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.

机构信息

Department of Human Genetics and Genomic Medicine, University of Southampton, Southampton, UK.

NIHR Southampton Biomedical Research, University Hospital Southampton, Southampton, UK.

出版信息

J Crohns Colitis. 2023 Nov 8;17(10):1672-1680. doi: 10.1093/ecco-jcc/jjad084.

Abstract

BACKGROUND

Inflammatory bowel disease [IBD] is a chronic inflammatory disorder with two main subtypes: Crohn's disease [CD] and ulcerative colitis [UC]. Prompt subtype diagnosis enables the correct treatment to be administered. Using genomic data, we aimed to assess machine learning [ML] to classify patients according to IBD subtype.

METHODS

Whole exome sequencing [WES] from paediatric/adult IBD patients was processed using an in-house bioinformatics pipeline. These data were condensed into the per-gene, per-individual genomic burden score, GenePy. Data were split into training and testing datasets [80/20]. Feature selection with a linear support vector classifier, and hyperparameter tuning with Bayesian Optimisation, were performed [training data]. The supervised ML method random forest was utilised to classify patients as CD or UC, using three panels: 1] all available genes; 2] autoimmune genes; 3] 'IBD' genes. ML results were assessed using area under the receiver operating characteristics curve [AUROC], sensitivity, and specificity on the testing dataset.

RESULTS

A total of 906 patients were included in analysis [600 CD, 306 UC]. Training data included 488 patients, balanced according to the minority class of UC. The autoimmune gene panel generated the best performing ML model [AUROC = 0.68], outperforming an IBD gene panel [AUROC = 0.61]. NOD2 was the top gene for discriminating CD and UC, regardless of the gene panel used. Lack of variation in genes with high GenePy scores in CD patients was the best classifier of a diagnosis of UC.

DISCUSSION

We demonstrate promising classification of patients by subtype using random forest and WES data. Focusing on specific subgroups of patients, with larger datasets, may result in better classification.

摘要

背景

炎症性肠病(IBD)是一种慢性炎症性疾病,有两个主要亚型:克罗恩病(CD)和溃疡性结肠炎(UC)。及时诊断亚型有助于进行正确的治疗。我们使用基因组数据评估机器学习(ML),根据 IBD 亚型对患者进行分类。

方法

使用内部生物信息学管道处理儿科/成人 IBD 患者的全外显子组测序(WES)数据。这些数据被浓缩为每个基因、每个个体的基因组负担评分 GenePy。数据分为训练和测试数据集[80/20]。使用线性支持向量分类器进行特征选择,并使用贝叶斯优化进行超参数调整[训练数据]。使用随机森林监督 ML 方法,使用三个面板将患者分类为 CD 或 UC:1)所有可用基因;2)自身免疫基因;3)“IBD”基因。使用测试数据集评估 ML 结果的Receiver Operating Characteristics 曲线下面积(AUROC)、敏感性和特异性。

结果

共有 906 名患者纳入分析[600 名 CD,306 名 UC]。训练数据包括 488 名患者,根据 UC 的少数类平衡。自身免疫基因面板生成的 ML 模型表现最佳[AUROC=0.68],优于 IBD 基因面板[AUROC=0.61]。无论使用哪个基因面板,NOD2 都是区分 CD 和 UC 的最佳基因。CD 患者 GenePy 评分高的基因变异缺乏是 UC 诊断的最佳分类器。

讨论

我们使用随机森林和 WES 数据证明了对患者进行亚型分类的有前景的方法。通过关注特定的患者亚组,并使用更大的数据集,可能会导致更好的分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/40dc/10637043/c6f8de175d18/jjad084_fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验