基于基因表达的表型预测的深度神经网络的生物学解释。

Biological interpretation of deep neural network for phenotype prediction based on gene expression.

机构信息

IBISC, Univ Evry, Université Paris-Saclay, 23 boulevard de France, 91034, Evry, France.

出版信息

BMC Bioinformatics. 2020 Nov 4;21(1):501. doi: 10.1186/s12859-020-03836-4.

DOI:10.1186/s12859-020-03836-4

PMID:33148191

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7643315/

Abstract

BACKGROUND

The use of predictive gene signatures to assist clinical decision is becoming more and more important. Deep learning has a huge potential in the prediction of phenotype from gene expression profiles. However, neural networks are viewed as black boxes, where accurate predictions are provided without any explanation. The requirements for these models to become interpretable are increasing, especially in the medical field.

RESULTS

We focus on explaining the predictions of a deep neural network model built from gene expression data. The most important neurons and genes influencing the predictions are identified and linked to biological knowledge. Our experiments on cancer prediction show that: (1) deep learning approach outperforms classical machine learning methods on large training sets; (2) our approach produces interpretations more coherent with biology than the state-of-the-art based approaches; (3) we can provide a comprehensive explanation of the predictions for biologists and physicians.

CONCLUSION

We propose an original approach for biological interpretation of deep learning models for phenotype prediction from gene expression data. Since the model can find relationships between the phenotype and gene expression, we may assume that there is a link between the identified genes and the phenotype. The interpretation can, therefore, lead to new biological hypotheses to be investigated by biologists.

摘要

背景

利用预测基因特征来辅助临床决策变得越来越重要。深度学习在从基因表达谱预测表型方面具有巨大潜力。然而，神经网络被视为黑盒，其中提供了准确的预测而没有任何解释。这些模型的可解释性要求越来越高，特别是在医学领域。

结果

我们专注于解释从基因表达数据构建的深度神经网络模型的预测。确定了影响预测的最重要神经元和基因，并将其与生物学知识联系起来。我们在癌症预测方面的实验表明：（1）在大型训练集上，深度学习方法优于经典机器学习方法；（2）与基于最先进方法的方法相比，我们的方法产生的解释更符合生物学；（3）我们可以为生物学家和医生提供对预测的全面解释。

结论

我们提出了一种从基因表达数据预测表型的深度学习模型的生物学解释的原始方法。由于模型可以找到表型和基因表达之间的关系，我们可以假设鉴定的基因与表型之间存在联系。因此，这种解释可以产生新的生物学假设，供生物学家进行研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/52a7/7643315/f133d99fe267/12859_2020_3836_Fig1_HTML.jpg

相似文献

Biological interpretation of deep neural network for phenotype prediction based on gene expression.基于基因表达的表型预测的深度神经网络的生物学解释。

BMC Bioinformatics. 2020 Nov 4;21(1):501. doi: 10.1186/s12859-020-03836-4.

Deep GONet: self-explainable deep neural network based on Gene Ontology for phenotype prediction from gene expression data.深度 GONet：基于基因本体论的可解释深度神经网络，用于从基因表达数据预测表型。

BMC Bioinformatics. 2021 Sep 22;22(Suppl 10):455. doi: 10.1186/s12859-021-04370-7.

Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer.解释图卷积神经网络决策：乳腺癌转移预测中与患者特异性相关的分子子网络。

Genome Med. 2021 Mar 11;13(1):42. doi: 10.1186/s13073-021-00845-7.

GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression.GraphGONet：一种自解释神经网络，它封装了基因本体论图，用于基于基因表达进行表型预测。

Bioinformatics. 2022 Apr 28;38(9):2504-2511. doi: 10.1093/bioinformatics/btac147.

XMR: an explainable multimodal neural network for drug response prediction.XMR：一种用于药物反应预测的可解释多模态神经网络。

Front Bioinform. 2023 Aug 2;3:1164482. doi: 10.3389/fbinf.2023.1164482. eCollection 2023.

Assessment of deep learning and transfer learning for cancer prediction based on gene expression data.基于基因表达数据的癌症预测的深度学习和迁移学习评估。

BMC Bioinformatics. 2022 Jul 3;23(1):262. doi: 10.1186/s12859-022-04807-7.

: Predicting metastasis to different sites using deep learning with gene expression data.利用深度学习和基因表达数据预测不同部位的转移。

Front Mol Biosci. 2022 Jul 22;9:913602. doi: 10.3389/fmolb.2022.913602. eCollection 2022.

Predicting molecular properties based on the interpretable graph neural network with multistep focus mechanism.基于具有多步聚焦机制的可解释图神经网络预测分子性质。

Brief Bioinform. 2023 Jan 19;24(1). doi: 10.1093/bib/bbac534.

Data Integration Using Advances in Machine Learning in Drug Discovery and Molecular Biology.利用机器学习进展进行药物发现和分子生物学中的数据整合

Methods Mol Biol. 2021;2190:167-184. doi: 10.1007/978-1-0716-0826-5_7.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用：以新生儿呼吸暂停预测为例的研究

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

引用本文的文献

DeepAnnotation: A novel interpretable deep learning-based genomic selection model that integrates comprehensive functional annotations.深度注释：一种基于深度学习的新型可解释基因组选择模型，该模型整合了全面的功能注释。

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf083.

Visible neural networks for multi-omics integration: a critical review.用于多组学整合的可视化神经网络：批判性综述

Front Artif Intell. 2025 Jul 17;8:1595291. doi: 10.3389/frai.2025.1595291. eCollection 2025.

Phenotypic and Gene Expression Alterations in Aquatic Organisms Exposed to Microplastics.暴露于微塑料的水生生物的表型和基因表达变化

Int J Mol Sci. 2025 Jan 26;26(3):1080. doi: 10.3390/ijms26031080.

G2PDeep-v2: a web-based deep-learning framework for phenotype prediction and biomarker discovery for all organisms using multi-omics data.G2PDeep-v2：一个基于网络的深度学习框架，用于利用多组学数据对所有生物体进行表型预测和生物标志物发现。

Res Sq. 2025 Jan 9:rs.3.rs-5776937. doi: 10.21203/rs.3.rs-5776937/v1.

XModNN: Explainable Modular Neural Network to Identify Clinical Parameters and Disease Biomarkers in Transcriptomic Datasets.XModNN：用于在转录组数据集中识别临床参数和疾病生物标志物的可解释模块化神经网络。

Biomolecules. 2024 Nov 25;14(12):1501. doi: 10.3390/biom14121501.

Phenotype prediction in plants is improved by integrating large-scale transcriptomic datasets.通过整合大规模转录组数据集可改善植物的表型预测。

NAR Genom Bioinform. 2024 Dec 27;6(4):lqae184. doi: 10.1093/nargab/lqae184. eCollection 2024 Dec.

Interpretable deep cross networks unveiled common signatures of dysregulated epitranscriptomes across 12 cancer types.可解释的深度交叉网络揭示了12种癌症类型中失调的表观转录组的共同特征。

Mol Ther Nucleic Acids. 2024 Oct 29;35(4):102376. doi: 10.1016/j.omtn.2024.102376. eCollection 2024 Dec 10.

Leveraging explainable deep learning methodologies to elucidate the biological underpinnings of Huntington's disease using single-cell RNA sequencing data.利用可解释的深度学习方法，利用单细胞 RNA 测序数据阐明亨廷顿病的生物学基础。

BMC Genomics. 2024 Oct 4;25(1):930. doi: 10.1186/s12864-024-10855-5.

G2PDeep-v2: a web-based deep-learning framework for phenotype prediction and biomarker discovery using multi-omics data.G2PDeep-v2：一个基于网络的深度学习框架，用于使用多组学数据进行表型预测和生物标志物发现。

bioRxiv. 2024 Sep 13:2024.09.10.612292. doi: 10.1101/2024.09.10.612292.

Exploring combinations of dimensionality reduction, transfer learning, and regularization methods for predicting binary phenotypes with transcriptomic data.探索降维、迁移学习和正则化方法的组合，用于利用转录组数据预测二元表型。

BMC Bioinformatics. 2024 Apr 26;25(1):167. doi: 10.1186/s12859-024-05795-6.

本文引用的文献

A survey of neural network-based cancer prediction models from microarray data.基于神经网络的微阵列数据分析癌症预测模型研究综述。

Artif Intell Med. 2019 Jun;97:204-214. doi: 10.1016/j.artmed.2019.01.006. Epub 2019 Jan 30.

Adipocytokines and breast cancer.脂肪细胞因子与乳腺癌

Curr Probl Cancer. 2018 Mar-Apr;42(2):208-214. doi: 10.1016/j.currproblcancer.2018.01.004. Epub 2018 Jan 8.

Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders.使用变分自编码器从癌症转录组中提取生物学相关的潜在空间。

Pac Symp Biocomput. 2018;23:80-91.

The homologous recombination protein RAD51 is a promising therapeutic target for cervical carcinoma.同源重组蛋白RAD51是一种很有前景的宫颈癌治疗靶点。

Oncol Rep. 2017 Aug;38(2):767-774. doi: 10.3892/or.2017.5724. Epub 2017 Jun 15.

Cell matrix adhesions in cancer: The proteins that form the glue.癌症中的细胞-基质黏附：构成“胶水”的蛋白质。

Oncotarget. 2017 Jul 18;8(29):48471-48487. doi: 10.18632/oncotarget.17265.

A DEEP LEARNING APPROACH FOR CANCER DETECTION AND RELEVANT GENE IDENTIFICATION.一种用于癌症检测和相关基因识别的深度学习方法。

Pac Symp Biocomput. 2017;22:219-229. doi: 10.1142/9789813207813_0022.

Identification of Cancer Related Genes Using a Comprehensive Map of Human Gene Expression.利用人类基因表达综合图谱鉴定癌症相关基因

PLoS One. 2016 Jun 20;11(6):e0157484. doi: 10.1371/journal.pone.0157484. eCollection 2016.

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.关于通过逐层相关性传播对非线性分类器决策进行逐像素解释

PLoS One. 2015 Jul 10;10(7):e0130140. doi: 10.1371/journal.pone.0130140. eCollection 2015.

Machine learning applications in genetics and genomics.机器学习在遗传学和基因组学中的应用。

Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920. Epub 2015 May 7.

Cancers associated with BRCA1 and BRCA2 mutations other than breast and ovarian.除乳腺癌和卵巢癌外，与BRCA1和BRCA2基因突变相关的癌症。

Cancer. 2015 Jan 15;121(2):269-75. doi: 10.1002/cncr.29041. Epub 2014 Sep 15.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于基因表达的表型预测的深度神经网络的生物学解释。

Biological interpretation of deep neural network for phenotype prediction based on gene expression.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献