Suppr超能文献

一种使用深度图卷积网络和半监督学习的新型候选疾病基因优先级排序方法。

A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning.

机构信息

Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran.

出版信息

BMC Bioinformatics. 2022 Oct 14;23(1):422. doi: 10.1186/s12859-022-04954-x.

Abstract

BACKGROUND

Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems.

METHODS

In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein-protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method.

RESULTS

Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods.

CONCLUSION

This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.

摘要

背景

在进行实验室研究之前,选择和优先考虑候选疾病基因是必要的,因为使用实验室方法从大量候选基因中鉴定疾病基因是一项非常昂贵和耗时的任务。有许多基于机器学习的基因优先级排序方法。这些方法在基因的特征向量、使用具有不同结构的不同数据集以及学习模型等方面存在差异。为基因创建合适的特征向量,并在具有不同和非欧几里得结构的各种数据(包括图)上创建合适的学习模型,以及缺乏负数据,是这些方法的非常重要的挑战。图神经网络最近在机器学习和其他相关领域中出现,并在广泛的问题上表现出卓越的性能。

方法

在这项研究中,提出了一种新的基于图卷积网络的半监督学习方法,该方法使用新颖的构建方法为每个基因构建特征向量。在提出的方法中,首先,我们使用基因本体论(GO)数据库中的术语为每个基因构建三个特征向量。然后,我们使用蛋白质-蛋白质相互作用(PPI)网络数据在这些向量上训练图卷积网络,以识别疾病候选基因。我们的模型发现了隐藏层表示,这些表示既编码了局部图结构,也编码了节点的特征。该方法的特点是同时考虑了生物网络(例如 PPI)的拓扑信息和其他来源的证据。最后,进行了验证实验以证明我们方法的效率。

结果

针对 16 种疾病进行了多项实验,以评估所提出方法的性能。实验表明,与八种最先进的网络和基于机器学习的疾病基因优先级排序方法相比,我们提出的方法在精度、ROC 曲线下面积(AUCs)和 F1 分数值方面取得了最佳结果。

结论

本研究表明,该半监督学习方法使用图卷积网络和一种创新方法,通过基于 GO 数据的分子功能、细胞成分和生物过程术语为基因创建三个特征向量,适当分类和排列候选疾病基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea2c/9563530/968ba26daa0e/12859_2022_4954_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验