一种用于基于外显子组的克罗恩病患者稳健诊断的可解释低复杂度机器学习框架。

An interpretable low-complexity machine learning framework for robust exome-based - diagnosis of Crohn's disease patients.

作者信息

Raimondi Daniele, Simm Jaak, Arany Adam, Fariselli Piero, Cleynen Isabelle, Moreau Yves

机构信息

ESAT-STADIUS, KU Leuven, 3001 Leuven, Belgium.

Department of Medical Sciences, University of Torino, Torino, 10123 Italy.

出版信息

NAR Genom Bioinform. 2020 Feb 21;2(1):lqaa011. doi: 10.1093/nargab/lqaa011. eCollection 2020 Mar.

DOI:10.1093/nargab/lqaa011

PMID:33575557

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7671306/

Abstract

Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based diagnosis of Crohn's disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.

摘要

全外显子组测序（WES）数据使研究人员能够查明许多孟德尔疾病的病因。随着时间的推移，测序数据对于解开旨在揭示基因型与表型关系的谜题至关重要，但目前许多概念和技术问题仍需解决。特别是，由于挑战的复杂性、数据相对稀缺以及诸如数据异质性等问题（这些都是机器学习（ML）方法的混杂因素），到目前为止，针对寡基因到多基因疾病的计算机辅助诊断的尝试非常少。在此，我们提出了一种基于外显子组的克罗恩病（CD）患者诊断方法，该方法解决了许多当前的方法学问题。首先，我们基于概念为WES数据设计了一种合理的、对ML友好的特征表示，适用于小样本量数据集。其次，我们提出了一种具有和强正则化的神经网络（NN），以限制其复杂性，从而降低过拟合风险。我们在3个CD病例对照数据集上对我们的NN进行了训练和测试，并将性能与之前CAGI挑战的参与者进行了比较。我们表明，尽管NN的复杂性有限，但它优于之前的方法。此外，我们通过分析变异和基因水平上学习到的模式并研究导致每个预测的决策过程来解释NN的预测。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/36cb/7671306/7de7ebe34b86/lqaa011fig1.jpg

相似文献

An interpretable low-complexity machine learning framework for robust exome-based - diagnosis of Crohn's disease patients.一种用于基于外显子组的克罗恩病患者稳健诊断的可解释低复杂度机器学习框架。

NAR Genom Bioinform. 2020 Feb 21;2(1):lqaa011. doi: 10.1093/nargab/lqaa011. eCollection 2020 Mar.

Working toward precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges.迈向精准医学：在基因组解读关键评估（CAGI）挑战中从外显子组预测表型

Hum Mutat. 2017 Sep;38(9):1182-1192. doi: 10.1002/humu.23280. Epub 2017 Jul 7.

Genome interpretation in a federated learning context allows the multi-center exome-based risk prediction of Crohn's disease patients.在联邦学习环境中进行基因组解读，可实现基于多中心外显子组的克罗恩病患者风险预测。

Sci Rep. 2023 Nov 9;13(1):19449. doi: 10.1038/s41598-023-46887-2.

Are machine learning based methods suited to address complex biological problems? Lessons from CAGI-5 challenges.基于机器学习的方法是否适合解决复杂的生物学问题？来自 CAGI-5 挑战赛的经验教训。

Hum Mutat. 2019 Sep;40(9):1455-1462. doi: 10.1002/humu.23784. Epub 2019 Jun 18.

Whole exome sequencing combined with integrated variant annotation prediction identifies asymptomatic Tangier disease with compound heterozygous mutations in ABCA1 gene.全外显子组测序结合整合变异注释预测鉴定出 ABCA1 基因复合杂合突变的无症状 Tangier 病。

Atherosclerosis. 2015 Jun;240(2):324-9. doi: 10.1016/j.atherosclerosis.2015.04.003. Epub 2015 Apr 7.

Handling limited datasets with neural networks in medical applications: A small-data approach.医学应用中使用神经网络处理有限数据集：一种小数据方法。

Artif Intell Med. 2017 Jan;75:51-63. doi: 10.1016/j.artmed.2016.12.003. Epub 2017 Jan 2.

Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease.大样本量和非线性稀疏模型概述了炎症性肠病中的上位效应。

Genome Biol. 2023 Oct 5;24(1):224. doi: 10.1186/s13059-023-03064-y.

Whole exome sequencing combined with integrated variant annotation prediction identifies a causative myosin essential light chain variant in hypertrophic cardiomyopathy.全外显子组测序结合综合变异注释预测可鉴定肥厚型心肌病中一种致病性肌球蛋白必需轻链变异体。

J Cardiol. 2016 Feb;67(2):133-9. doi: 10.1016/j.jjcc.2015.09.003. Epub 2015 Oct 9.

Inferring Crohn's disease association from exome sequences by integrating biological knowledge.通过整合生物学知识从外显子序列推断克罗恩病关联

BMC Med Genomics. 2016 Aug 12;9 Suppl 1(Suppl 1):35. doi: 10.1186/s12920-016-0189-2.

Neural Network-Based Coronary Heart Disease Risk Prediction Using Feature Correlation Analysis.基于神经网络并运用特征相关性分析的冠心病风险预测

J Healthc Eng. 2017;2017. doi: 10.1155/2017/2780501.

引用本文的文献

Explainable deep learning for stratified medicine in inflammatory bowel disease.用于炎症性肠病分层医学的可解释深度学习

Genome Biol. 2025 Jul 24;26(1):223. doi: 10.1186/s13059-025-03692-6.

AI-powered precision medicine: utilizing genetic risk factor optimization to revolutionize healthcare.人工智能驱动的精准医学：利用遗传风险因素优化彻底改变医疗保健。

NAR Genom Bioinform. 2025 May 5;7(2):lqaf038. doi: 10.1093/nargab/lqaf038. eCollection 2025 Jun.

A quantitative benchmark of neural network feature selection methods for detecting nonlinear signals.用于检测非线性信号的神经网络特征选择方法的定量基准。

Sci Rep. 2024 Dec 28;14(1):31180. doi: 10.1038/s41598-024-82583-5.

Designing interpretable deep learning applications for functional genomics: a quantitative analysis.设计可解释的深度学习应用于功能基因组学：一项定量分析。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae449.

Inflammatory bowel disease genomics, transcriptomics, proteomics and metagenomics meet artificial intelligence.炎症性肠病基因组学、转录组学、蛋白质组学和宏基因组学与人工智能相遇。

United European Gastroenterol J. 2024 Dec;12(10):1461-1480. doi: 10.1002/ueg2.12655. Epub 2024 Aug 31.

Advances in Inflammatory Bowel Disease Diagnostics: Machine Learning and Genomic Profiling Reveal Key Biomarkers for Early Detection.炎症性肠病诊断的进展：机器学习和基因组分析揭示早期检测的关键生物标志物

Diagnostics (Basel). 2024 Jun 4;14(11):1182. doi: 10.3390/diagnostics14111182.

Biologically meaningful genome interpretation models to address data underdetermination for the leaf and seed ionome prediction in Arabidopsis thaliana.用于解决拟南芥叶和种子离子组预测中数据不足问题的具有生物学意义的基因组解释模型。

Sci Rep. 2024 Jun 8;14(1):13188. doi: 10.1038/s41598-024-63855-6.

Sci Rep. 2023 Nov 9;13(1):19449. doi: 10.1038/s41598-023-46887-2.

Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease.大样本量和非线性稀疏模型概述了炎症性肠病中的上位效应。

Genome Biol. 2023 Oct 5;24(1):224. doi: 10.1186/s13059-023-03064-y.

Supervised Machine Learning Classifies Inflammatory Bowel Disease Patients by Subtype Using Whole Exome Sequencing Data.基于全外显子测序数据的监督机器学习方法对炎症性肠病患者进行亚型分类。

J Crohns Colitis. 2023 Nov 8;17(10):1672-1680. doi: 10.1093/ecco-jcc/jjad084.

本文引用的文献

Identifying Crohn's disease signal from variome analysis.从变异组分析中识别克罗恩病信号。

Genome Med. 2019 Sep 30;11(1):59. doi: 10.1186/s13073-019-0670-6.

Integrating molecular networks with genetic variant interpretation for precision medicine.将分子网络与遗传变异解释相结合，以实现精准医疗。

Wiley Interdiscip Rev Syst Biol Med. 2019 May;11(3):e1443. doi: 10.1002/wsbm.1443. Epub 2018 Dec 12.

Understanding mutational effects in digenic diseases.了解双基因疾病中的突变效应。

Nucleic Acids Res. 2017 Sep 6;45(15):e140. doi: 10.1093/nar/gkx557.

Hum Mutat. 2017 Sep;38(9):1182-1192. doi: 10.1002/humu.23280. Epub 2017 Jul 7.

DeepBipolar: Identifying genomic mutations for bipolar disorder via deep learning.深度双相情感障碍：通过深度学习识别双相情感障碍的基因组突变。

Hum Mutat. 2017 Sep;38(9):1217-1224. doi: 10.1002/humu.23272. Epub 2017 Aug 1.

CAGI4 Crohn's exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease.CAGI4克罗恩病外显子组挑战：用于评估克罗恩病风险的标记单核苷酸多态性与外显子组变异模型

Hum Mutat. 2017 Sep;38(9):1225-1234. doi: 10.1002/humu.23256. Epub 2017 Jun 28.

DEOGEN2: prediction and interactive visualization of single amino acid variant deleteriousness in human proteins.DEOGEN2：人类蛋白质中单氨基酸变异有害性的预测和交互式可视化。

Nucleic Acids Res. 2017 Jul 3;45(W1):W201-W206. doi: 10.1093/nar/gkx390.

Genetics of inflammatory bowel disease: beyond NOD2.炎症性肠病的遗传学：超越 NOD2。

Lancet Gastroenterol Hepatol. 2017 Mar;2(3):224-234. doi: 10.1016/S2468-1253(16)30111-X. Epub 2017 Feb 9.

Crohn disease risk prediction-Best practices and pitfalls with exome data.克罗恩病风险预测——外显子组数据的最佳实践与陷阱

Hum Mutat. 2017 Sep;38(9):1193-1200. doi: 10.1002/humu.23177. Epub 2017 Mar 21.

M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity.M-CAP 以高灵敏度消除临床外显子组中大多数意义不明的变异。

Nat Genet. 2016 Dec;48(12):1581-1586. doi: 10.1038/ng.3703. Epub 2016 Oct 24.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种用于基于外显子组的克罗恩病患者稳健诊断的可解释低复杂度机器学习框架。

An interpretable low-complexity machine learning framework for robust exome-based - diagnosis of Crohn's disease patients.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献