• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

整合基因的节点嵌入和生物学注释以预测疾病-基因关联。

Integrating node embeddings and biological annotations for genes to predict disease-gene associations.

作者信息

Ata Sezin Kircali, Ou-Yang Le, Fang Yuan, Kwoh Chee-Keong, Wu Min, Li Xiao-Li

机构信息

Department of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.

Department of Electronic Engineering, College of Information Engineering, Shenzhen University, China, Singapore, Singapore.

出版信息

BMC Syst Biol. 2018 Dec 31;12(Suppl 9):138. doi: 10.1186/s12918-018-0662-y.

DOI:10.1186/s12918-018-0662-y
PMID:30598097
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6311944/
Abstract

BACKGROUND

Predicting disease causative genes (or simply, disease genes) has played critical roles in understanding the genetic basis of human diseases and further providing disease treatment guidelines. While various computational methods have been proposed for disease gene prediction, with the recent increasing availability of biological information for genes, it is highly motivated to leverage these valuable data sources and extract useful information for accurately predicting disease genes.

RESULTS

We present an integrative framework called N2VKO to predict disease genes. Firstly, we learn the node embeddings from protein-protein interaction (PPI) network for genes by adapting the well-known representation learning method node2vec. Secondly, we combine the learned node embeddings with various biological annotations as rich feature representation for genes, and subsequently build binary classification models for disease gene prediction. Finally, as the data for disease gene prediction is usually imbalanced (i.e. the number of the causative genes for a specific disease is much less than that of its non-causative genes), we further address this serious data imbalance issue by applying oversampling techniques for imbalance data correction to improve the prediction performance. Comprehensive experiments demonstrate that our proposed N2VKO significantly outperforms four state-of-the-art methods for disease gene prediction across seven diseases.

CONCLUSIONS

In this study, we show that node embeddings learned from PPI networks work well for disease gene prediction, while integrating node embeddings with other biological annotations further improves the performance of classification models. Moreover, oversampling techniques for imbalance correction further enhances the prediction performance. In addition, the literature search of predicted disease genes also shows the effectiveness of our proposed N2VKO framework for disease gene prediction.

摘要

背景

预测疾病致病基因(或简称为疾病基因)在理解人类疾病的遗传基础以及进一步提供疾病治疗指南方面发挥了关键作用。虽然已经提出了各种用于疾病基因预测的计算方法,但随着最近基因生物学信息的可用性不断增加,利用这些有价值的数据源并提取有用信息以准确预测疾病基因具有很强的动机。

结果

我们提出了一个名为N2VKO的综合框架来预测疾病基因。首先,我们通过采用著名的表示学习方法node2vec从基因的蛋白质 - 蛋白质相互作用(PPI)网络中学习节点嵌入。其次,我们将学习到的节点嵌入与各种生物学注释相结合,作为基因丰富的特征表示,随后构建用于疾病基因预测的二元分类模型。最后,由于疾病基因预测的数据通常是不平衡的(即特定疾病的致病基因数量远少于其非致病基因数量),我们通过应用过采样技术来校正不平衡数据,以进一步解决这个严重的数据不平衡问题,从而提高预测性能。综合实验表明,我们提出的N2VKO在七种疾病的疾病基因预测方面显著优于四种最先进的方法。

结论

在本研究中,我们表明从PPI网络中学习到的节点嵌入在疾病基因预测中效果良好,而将节点嵌入与其他生物学注释相结合进一步提高了分类模型的性能。此外,用于不平衡校正的过采样技术进一步增强了预测性能。此外,对预测疾病基因的文献检索也表明了我们提出的N2VKO框架在疾病基因预测方面的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/8ae9118feeb1/12918_2018_662_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/ac057ed34394/12918_2018_662_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/d5829b7c7aae/12918_2018_662_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/0fd2c21f253a/12918_2018_662_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/d76d6deca3d9/12918_2018_662_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/201dee95f1a0/12918_2018_662_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/120c234da5f6/12918_2018_662_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/19f6d754a69f/12918_2018_662_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/5e4bfee66ef7/12918_2018_662_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/b01706ae6b90/12918_2018_662_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/8ae9118feeb1/12918_2018_662_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/ac057ed34394/12918_2018_662_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/d5829b7c7aae/12918_2018_662_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/0fd2c21f253a/12918_2018_662_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/d76d6deca3d9/12918_2018_662_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/201dee95f1a0/12918_2018_662_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/120c234da5f6/12918_2018_662_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/19f6d754a69f/12918_2018_662_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/5e4bfee66ef7/12918_2018_662_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/b01706ae6b90/12918_2018_662_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1911/6311944/8ae9118feeb1/12918_2018_662_Fig10_HTML.jpg

相似文献

1
Integrating node embeddings and biological annotations for genes to predict disease-gene associations.整合基因的节点嵌入和生物学注释以预测疾病-基因关联。
BMC Syst Biol. 2018 Dec 31;12(Suppl 9):138. doi: 10.1186/s12918-018-0662-y.
2
Graph embeddings on gene ontology annotations for protein-protein interaction prediction.基于基因本体论注释的图嵌入在蛋白质相互作用预测中的应用。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):560. doi: 10.1186/s12859-020-03816-8.
3
Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks.基于从蛋白质-蛋白质相互作用网络中获得的节点嵌入来识别蛋白质复合物。
BMC Bioinformatics. 2018 Sep 21;19(1):332. doi: 10.1186/s12859-018-2364-2.
4
Cross-organism learning method to discover new gene functionalities.跨生物学习方法发现新基因功能。
Comput Methods Programs Biomed. 2016 Apr;126:20-34. doi: 10.1016/j.cmpb.2015.12.002. Epub 2015 Dec 17.
5
Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction.利用 PPI 网络自相关性在层次多标签分类树中进行基因功能预测。
BMC Bioinformatics. 2013 Sep 26;14:285. doi: 10.1186/1471-2105-14-285.
6
DeepHE: Accurately predicting human essential genes based on deep learning.DeepHE:基于深度学习的人类必需基因精准预测。
PLoS Comput Biol. 2020 Sep 16;16(9):e1008229. doi: 10.1371/journal.pcbi.1008229. eCollection 2020 Sep.
7
BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.生物概念向量:在大规模上创建和评估基于文献的生物医学概念嵌入。
PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.
8
Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network.基于表型特异性网络的疾病基因预测的基因引力样算法。
BMC Syst Biol. 2017 Dec 6;11(1):121. doi: 10.1186/s12918-017-0519-9.
9
Disease gene classification with metagraph representations.基于超图表示的疾病基因分类。
Methods. 2017 Dec 1;131:83-92. doi: 10.1016/j.ymeth.2017.06.036. Epub 2017 Jul 8.
10
GNE: a deep learning framework for gene network inference by aggregating biological information.GNE:一种通过整合生物信息进行基因网络推断的深度学习框架。
BMC Syst Biol. 2019 Apr 5;13(Suppl 2):38. doi: 10.1186/s12918-019-0694-y.

引用本文的文献

1
Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景:任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述
Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.
2
A network embedding approach to identify active modules in biological interaction networks.一种用于识别生物相互作用网络中活性模块的网络嵌入方法。
Life Sci Alliance. 2023 Jun 20;6(9). doi: 10.26508/lsa.202201550. Print 2023 Sep.
3
HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression.

本文引用的文献

1
Large-scale analysis of disease pathways in the human interactome.人类相互作用组中疾病通路的大规模分析。
Pac Symp Biocomput. 2018;23:111-122.
2
Predicting multicellular function through multi-layer tissue networks.通过多层组织网络预测多细胞功能。
Bioinformatics. 2017 Jul 15;33(14):i190-i198. doi: 10.1093/bioinformatics/btx252.
3
Disease gene classification with metagraph representations.基于超图表示的疾病基因分类。
HetIG-PreDiG:一种基于基因表达的用于预测人类疾病基因的异构集成图模型。
PLoS One. 2023 Feb 15;18(2):e0280839. doi: 10.1371/journal.pone.0280839. eCollection 2023.
4
Accurately modeling biased random walks on weighted networks using node2vec.使用 node2vec 准确建模加权网络上有偏随机游走。
Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btad047.
5
GeneWalk identifies relevant gene functions for a biological context using network representation learning.GeneWalk 使用网络表示学习来确定生物背景下相关的基因功能。
Genome Biol. 2021 Feb 2;22(1):55. doi: 10.1186/s13059-021-02264-8.
6
A Knowledge-Based Machine Learning Approach to Gene Prioritisation in Amyotrophic Lateral Sclerosis.基于知识的机器学习方法在肌萎缩侧索硬化症中的基因优先级排序。
Genes (Basel). 2020 Jun 19;11(6):668. doi: 10.3390/genes11060668.
7
Supervised learning is an accurate method for network-based gene classification.监督学习是一种基于网络的基因分类的精确方法。
Bioinformatics. 2020 Jun 1;36(11):3457-3465. doi: 10.1093/bioinformatics/btaa150.
8
A novel one-class classification approach to accurately predict disease-gene association in acute myeloid leukemia cancer.一种新颖的单类分类方法,可准确预测急性髓系白血病癌症中的疾病-基因关联。
PLoS One. 2019 Dec 11;14(12):e0226115. doi: 10.1371/journal.pone.0226115. eCollection 2019.
9
Pathway and network embedding methods for prioritizing psychiatric drugs.用于优先考虑精神药物的途径和网络嵌入方法。
Pac Symp Biocomput. 2020;25:671-682.
Methods. 2017 Dec 1;131:83-92. doi: 10.1016/j.ymeth.2017.06.036. Epub 2017 Jul 8.
4
Network propagation: a universal amplifier of genetic associations.网络传播:遗传关联的通用放大器。
Nat Rev Genet. 2017 Sep;18(9):551-562. doi: 10.1038/nrg.2017.38. Epub 2017 Jun 12.
5
DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants.DisGeNET:一个整合人类疾病相关基因和变异信息的综合平台。
Nucleic Acids Res. 2017 Jan 4;45(D1):D833-D839. doi: 10.1093/nar/gkw943. Epub 2016 Oct 19.
6
node2vec: Scalable Feature Learning for Networks.节点2向量:网络的可扩展特征学习
KDD. 2016 Aug;2016:855-864. doi: 10.1145/2939672.2939754.
7
Structure-Based Identification, Characterization, and Disruption of Human Securin-Binding SH3 Domains in Lung Cancer.基于结构的肺癌中人类securin结合SH3结构域的鉴定、表征及破坏
Cancer Invest. 2016 May 27;34(5):231-6. doi: 10.1080/07357907.2016.1183024. Epub 2016 May 21.
8
Pathway-Dependent Effectiveness of Network Algorithms for Gene Prioritization.基因优先级排序网络算法的通路依赖性有效性
PLoS One. 2015 Jun 19;10(6):e0130589. doi: 10.1371/journal.pone.0130589. eCollection 2015.
9
UniProt: a hub for protein information.通用蛋白质数据库(UniProt):蛋白质信息中心。
Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. doi: 10.1093/nar/gku989. Epub 2014 Oct 27.
10
Human symptoms-disease network.人类症状-疾病网络。
Nat Commun. 2014 Jun 26;5:4212. doi: 10.1038/ncomms5212.