一种使用深度图卷积网络和半监督学习的新型候选疾病基因优先级排序方法。

A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning.

机构信息

Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran.

出版信息

BMC Bioinformatics. 2022 Oct 14;23(1):422. doi: 10.1186/s12859-022-04954-x.

DOI:10.1186/s12859-022-04954-x

PMID:36241966

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9563530/

Abstract

BACKGROUND

Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems.

METHODS

In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein-protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method.

RESULTS

Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods.

CONCLUSION

This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.

摘要

背景

在进行实验室研究之前，选择和优先考虑候选疾病基因是必要的，因为使用实验室方法从大量候选基因中鉴定疾病基因是一项非常昂贵和耗时的任务。有许多基于机器学习的基因优先级排序方法。这些方法在基因的特征向量、使用具有不同结构的不同数据集以及学习模型等方面存在差异。为基因创建合适的特征向量，并在具有不同和非欧几里得结构的各种数据（包括图）上创建合适的学习模型，以及缺乏负数据，是这些方法的非常重要的挑战。图神经网络最近在机器学习和其他相关领域中出现，并在广泛的问题上表现出卓越的性能。

方法

在这项研究中，提出了一种新的基于图卷积网络的半监督学习方法，该方法使用新颖的构建方法为每个基因构建特征向量。在提出的方法中，首先，我们使用基因本体论（GO）数据库中的术语为每个基因构建三个特征向量。然后，我们使用蛋白质-蛋白质相互作用（PPI）网络数据在这些向量上训练图卷积网络，以识别疾病候选基因。我们的模型发现了隐藏层表示，这些表示既编码了局部图结构，也编码了节点的特征。该方法的特点是同时考虑了生物网络（例如 PPI）的拓扑信息和其他来源的证据。最后，进行了验证实验以证明我们方法的效率。

结果

针对 16 种疾病进行了多项实验，以评估所提出方法的性能。实验表明，与八种最先进的网络和基于机器学习的疾病基因优先级排序方法相比，我们提出的方法在精度、ROC 曲线下面积（AUCs）和 F1 分数值方面取得了最佳结果。

结论

本研究表明，该半监督学习方法使用图卷积网络和一种创新方法，通过基于 GO 数据的分子功能、细胞成分和生物过程术语为基因创建三个特征向量，适当分类和排列候选疾病基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ea2c/9563530/968ba26daa0e/12859_2022_4954_Fig1_HTML.jpg

相似文献

A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning.

BMC Bioinformatics. 2022 Oct 14;23(1):422. doi: 10.1186/s12859-022-04954-x.

A unified deep semi-supervised graph learning scheme based on nodes re-weighting and manifold regularization.

Neural Netw. 2023 Jan;158:188-196. doi: 10.1016/j.neunet.2022.11.017. Epub 2022 Nov 19.

Deep semi-supervised learning via dynamic anchor graph embedding in latent space.

Neural Netw. 2022 Feb;146:350-360. doi: 10.1016/j.neunet.2021.11.026. Epub 2021 Dec 1.

MGLNN: Semi-supervised learning via Multiple Graph Cooperative Learning Neural Networks.

Neural Netw. 2022 Sep;153:204-214. doi: 10.1016/j.neunet.2022.05.024. Epub 2022 Jun 3.

Semi-supervised graph convolutional networks for the domain adaptive recognition of thyroid nodules in cross-device ultrasound images.

Med Phys. 2023 Dec;50(12):7806-7821. doi: 10.1002/mp.16384. Epub 2023 Apr 6.

Heterogeneous graph convolutional network for multi-view semi-supervised classification.

Neural Netw. 2024 Oct;178:106438. doi: 10.1016/j.neunet.2024.106438. Epub 2024 Jun 7.

Graph-Based Disease Prediction in Neuroimaging: Investigating the Impact of Feature Selection.

Adv Exp Med Biol. 2023;1424:223-230. doi: 10.1007/978-3-031-31982-2_24.

Graph Convolution Networks with manifold regularization for semi-supervised learning.

Neural Netw. 2020 Jul;127:160-167. doi: 10.1016/j.neunet.2020.04.016. Epub 2020 Apr 23.

SSCI: Self-Supervised Deep Learning Improves Network Structure for Cancer Driver Gene Identification.

Int J Mol Sci. 2024 Sep 26;25(19):10351. doi: 10.3390/ijms251910351.

Predicting functions of maize proteins using graph convolutional network.

BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):420. doi: 10.1186/s12859-020-03745-6.

引用本文的文献

A Systematic Review of the Application of Graph Neural Networks to Extract Candidate Genes and Biological Associations.

Am J Med Genet B Neuropsychiatr Genet. 2025 Sep;198(6):3-18. doi: 10.1002/ajmg.b.33031. Epub 2025 May 2.

Integration of multi-source gene interaction networks and omics data with graph attention networks to identify novel disease genes.

Bioinformatics. 2025 Apr 23. doi: 10.1093/bioinformatics/btaf181.

MNMO: discover driver genes from a multi-omics data based-multi-layer network.

Bioinformatics. 2025 Mar 29;41(4). doi: 10.1093/bioinformatics/btaf134.

Genetic Foundations of Nellore Traits: A Gene Prioritization and Functional Analyses of Genome-Wide Association Study Results.

Genes (Basel). 2024 Aug 27;15(9):1131. doi: 10.3390/genes15091131.

Ensemble decision of local similarity indices on the biological network for disease related gene prediction.

PeerJ. 2024 Sep 5;12:e17975. doi: 10.7717/peerj.17975. eCollection 2024.

TransGCN: a semi-supervised graph convolution network-based framework to infer protein translocations in spatio-temporal proteomics.

Brief Bioinform. 2024 Jan 22;25(2). doi: 10.1093/bib/bbae055.

Recent Advances in Deep Learning for Protein-Protein Interaction Analysis: A Comprehensive Review.

Molecules. 2023 Jul 2;28(13):5169. doi: 10.3390/molecules28135169.

本文引用的文献

Graph convolutional networks: a comprehensive review.

Comput Soc Netw. 2019;6(1):11. doi: 10.1186/s40649-019-0069-y. Epub 2019 Nov 10.

TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain.

BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):274. doi: 10.1186/s12859-021-04190-9.

Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning.

Eur J Hum Genet. 2021 Oct;29(10):1527-1535. doi: 10.1038/s41431-021-00930-w. Epub 2021 Jul 19.

Disease gene prediction with privileged information and heteroscedastic dropout.

Bioinformatics. 2021 Jul 12;37(Suppl_1):i410-i417. doi: 10.1093/bioinformatics/btab310.

Prioritizing Cancer Genes Based on an Improved Random Walk Method.

Front Genet. 2020 Apr 28;11:377. doi: 10.3389/fgene.2020.00377. eCollection 2020.

A random walk-based method to identify driver genes by integrating the subcellular localization and variation frequency into bipartite graph.

BMC Bioinformatics. 2019 May 14;20(1):238. doi: 10.1186/s12859-019-2847-9.

DeepChemStable: Chemical Stability Prediction with an Attention-Based Graph Convolution Network.

J Chem Inf Model. 2019 Mar 25;59(3):1044-1049. doi: 10.1021/acs.jcim.8b00672. Epub 2019 Feb 21.

Phenotype-driven gene prioritization for rare diseases using graph convolution on heterogeneous networks.

BMC Med Genomics. 2018 Jul 6;11(1):57. doi: 10.1186/s12920-018-0372-8.

C-PUGP: A cluster-based positive unlabeled learning method for disease gene prediction and prioritization.

Comput Biol Chem. 2018 Oct;76:23-31. doi: 10.1016/j.compbiolchem.2018.05.022. Epub 2018 Jun 1.

Network-based integration of multi-omics data for prioritizing cancer genes.

Bioinformatics. 2018 Jul 15;34(14):2441-2448. doi: 10.1093/bioinformatics/bty148.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种使用深度图卷积网络和半监督学习的新型候选疾病基因优先级排序方法。

A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献