GraphGONet：一种自解释神经网络，它封装了基因本体论图，用于基于基因表达进行表型预测。

GraphGONet: a self-explaining neural network encapsulating the Gene Ontology graph for phenotype prediction on gene expression.

机构信息

Computer Science Department, IBISC, Université Paris-Saclay (Univ. Évry), Évry-Courcouronnes 91020, France.

出版信息

Bioinformatics. 2022 Apr 28;38(9):2504-2511. doi: 10.1093/bioinformatics/btac147.

DOI:10.1093/bioinformatics/btac147

PMID:35266505

Abstract

MOTIVATION

Medical care is becoming more and more specific to patients' needs due to the increased availability of omics data. The application to these data of sophisticated machine learning models, in particular deep learning (DL), can improve the field of precision medicine. However, their use in clinics is limited as their predictions are not accompanied by an explanation. The production of accurate and intelligible predictions can benefit from the inclusion of domain knowledge. Therefore, knowledge-based DL models appear to be a promising solution.

RESULTS

In this article, we propose GraphGONet, where the Gene Ontology is encapsulated in the hidden layers of a new self-explaining neural network. Each neuron in the layers represents a biological concept, combining the gene expression profile of a patient and the information from its neighboring neurons. The experiments described in the article confirm that our model not only performs as accurately as the state-of-the-art (non-explainable ones) but also automatically produces stable and intelligible explanations composed of the biological concepts with the highest contribution. This feature allows experts to use our tool in a medical setting.

AVAILABILITY AND IMPLEMENTATION

GraphGONet is freely available at https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git. The microarray dataset is accessible from the ArrayExpress database under the identifier E-MTAB-3732. The TCGA datasets can be downloaded from the Genomic Data Commons (GDC) data portal.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

由于组学数据的可用性增加，医疗保健越来越针对患者的需求。将复杂的机器学习模型（尤其是深度学习（DL））应用于这些数据可以改善精准医学领域。但是，由于其预测没有附带解释，因此它们在临床中的应用受到限制。包含领域知识可以提高准确和可理解的预测的产生。因此，基于知识的 DL 模型似乎是一种很有前途的解决方案。

结果

在本文中，我们提出了 GraphGONet，其中基因本体论被封装在一个新的自解释神经网络的隐藏层中。层中的每个神经元代表一个生物学概念，将患者的基因表达谱与来自其相邻神经元的信息相结合。本文中描述的实验证实，我们的模型不仅与最先进的模型（不可解释的模型）一样准确，而且还可以自动生成由具有最高贡献的生物学概念组成的稳定且可理解的解释。该功能允许专家在医疗环境中使用我们的工具。

可用性和实现

GraphGONet 可在 https://forge.ibisc.univ-evry.fr/vbourgeais/GraphGONet.git 上免费获得。微阵列数据集可从 ArrayExpress 数据库中以标识符 E-MTAB-3732 访问。TCGA 数据集可从基因组数据共享（GDC）数据门户下载。