构建用于疾病基因推断的基因语义相似性网络。

Constructing a gene semantic similarity network for the inference of disease genes.

作者信息

Jiang Rui, Gan Mingxin, He Peng

机构信息

MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China.

出版信息

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S2. doi: 10.1186/1752-0509-5-S2-S2. Epub 2011 Dec 14.

DOI:10.1186/1752-0509-5-S2-S2

PMID:22784573

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3287482/

Abstract

MOTIVATION

The inference of genes that are truly associated with inherited human diseases from a set of candidates resulting from genetic linkage studies has been one of the most challenging tasks in human genetics. Although several computational approaches have been proposed to prioritize candidate genes relying on protein-protein interaction (PPI) networks, these methods can usually cover less than half of known human genes.

RESULTS

We propose to rely on the biological process domain of the gene ontology to construct a gene semantic similarity network and then use the network to infer disease genes. We show that the constructed network covers about 50% more genes than a typical PPI network. By analyzing the gene semantic similarity network with the PPI network, we show that gene pairs tend to have higher semantic similarity scores if the corresponding proteins are closer to each other in the PPI network. By analyzing the gene semantic similarity network with a phenotype similarity network, we show that semantic similarity scores of genes associated with similar diseases are significantly different from those of genes selected at random, and that genes with higher semantic similarity scores tend to be associated with diseases with higher phenotype similarity scores. We further use the gene semantic similarity network with a random walk with restart model to infer disease genes. Through a series of large-scale leave-one-out cross-validation experiments, we show that the gene semantic similarity network can achieve not only higher coverage but also higher accuracy than the PPI network in the inference of disease genes.

摘要

动机

从基因连锁研究产生的一组候选基因中推断出与人类遗传性疾病真正相关的基因，一直是人类遗传学中最具挑战性的任务之一。尽管已经提出了几种计算方法，依靠蛋白质 - 蛋白质相互作用（PPI）网络对候选基因进行优先级排序，但这些方法通常只能覆盖不到一半的已知人类基因。

结果

我们建议依靠基因本体的生物过程领域构建基因语义相似性网络，然后使用该网络推断疾病基因。我们表明，构建的网络比典型的PPI网络覆盖的基因多约50%。通过将基因语义相似性网络与PPI网络进行分析，我们发现如果相应蛋白质在PPI网络中彼此更接近，则基因对往往具有更高的语义相似性得分。通过将基因语义相似性网络与表型相似性网络进行分析，我们表明与相似疾病相关的基因的语义相似性得分与随机选择的基因的语义相似性得分有显著差异，并且语义相似性得分较高的基因往往与表型相似性得分较高的疾病相关。我们进一步使用带有重启模型的随机游走的基因语义相似性网络来推断疾病基因。通过一系列大规模的留一法交叉验证实验，我们表明在疾病基因的推断中，基因语义相似性网络不仅可以实现比PPI网络更高的覆盖率，还可以实现更高的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50ba/3287482/5b0e804d447c/1752-0509-5-S2-S2-1.jpg

相似文献

Constructing a gene semantic similarity network for the inference of disease genes.构建用于疾病基因推断的基因语义相似性网络。

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S2. doi: 10.1186/1752-0509-5-S2-S2. Epub 2011 Dec 14.

Constructing an integrated gene similarity network for the identification of disease genes.构建用于疾病基因识别的综合基因相似性网络。

J Biomed Semantics. 2017 Sep 20;8(Suppl 1):32. doi: 10.1186/s13326-017-0141-1.

Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach.整合多个蛋白质-蛋白质相互作用网络以优先考虑疾病基因：一种贝叶斯回归方法。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-12-S1-S11.

Prioritization of potential candidate disease genes by topological similarity of protein-protein interaction network and phenotype data.通过蛋白质-蛋白质相互作用网络和表型数据的拓扑相似性对潜在候选疾病基因进行优先级排序。

J Biomed Inform. 2015 Feb;53:229-36. doi: 10.1016/j.jbi.2014.11.004. Epub 2014 Nov 15.

Prioritization of candidate disease genes by combining topological similarity and semantic similarity.通过结合拓扑相似性和语义相似性对候选疾病基因进行优先级排序。

J Biomed Inform. 2015 Oct;57:1-5. doi: 10.1016/j.jbi.2015.07.005. Epub 2015 Jul 11.

Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities.解剖学本体数据与蛋白质-蛋白质相互作用网络的整合提高了解剖实体候选基因预测的准确性。

BMC Bioinformatics. 2020 Oct 7;21(1):442. doi: 10.1186/s12859-020-03773-2.

Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network.基于表型特异性网络的疾病基因预测的基因引力样算法。

BMC Syst Biol. 2017 Dec 6;11(1):121. doi: 10.1186/s12918-017-0519-9.

Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression.通过扩大种子集并融合网络拓扑结构和基因表达信息来对候选疾病基因进行优先级排序。

Mol Biosyst. 2014 Jun;10(6):1400-8. doi: 10.1039/c3mb70588a. Epub 2014 Apr 3.

Correlating information contents of gene ontology terms to infer semantic similarity of gene products.关联基因本体术语的信息内容以推断基因产物的语义相似性。

Comput Math Methods Med. 2014;2014:891842. doi: 10.1155/2014/891842. Epub 2014 May 22.

Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins.基于基因本体论的语义相似性度量的比较分析及其在识别必需蛋白质中的应用。

PLoS One. 2023 Apr 21;18(4):e0284274. doi: 10.1371/journal.pone.0284274. eCollection 2023.

引用本文的文献

Enhancing Molecular Network-Based Cancer Driver Gene Prediction Using Machine Learning Approaches: Current Challenges and Opportunities.使用机器学习方法增强基于分子网络的癌症驱动基因预测：当前挑战与机遇

J Cell Mol Med. 2025 Jan;29(1):e70351. doi: 10.1111/jcmm.70351.

DGMP: Identifying Cancer Driver Genes by Jointing DGCN and MLP from Multi-omics Genomic Data.DGMP：通过结合多组学基因组数据的 DGCN 和 MLP 识别癌症驱动基因。

Genomics Proteomics Bioinformatics. 2022 Oct;20(5):928-938. doi: 10.1016/j.gpb.2022.11.004. Epub 2022 Dec 1.

BMC Bioinformatics. 2020 Oct 7;21(1):442. doi: 10.1186/s12859-020-03773-2.

UFO: A tool for unifying biomedical ontology-based semantic similarity calculation, enrichment analysis and visualization.UFO：一种用于统一基于生物医学本体的语义相似性计算、富集分析和可视化的工具。

PLoS One. 2020 Jul 9;15(7):e0235670. doi: 10.1371/journal.pone.0235670. eCollection 2020.

A network-based integrated framework for predicting virus-prokaryote interactions.一种基于网络的预测病毒与原核生物相互作用的综合框架。

NAR Genom Bioinform. 2020 Jun;2(2):lqaa044. doi: 10.1093/nargab/lqaa044. Epub 2020 Jun 23.

Integrated querying and version control of context-specific biological networks.上下文特定生物网络的集成查询和版本控制。

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa018.

Inferring novel genes related to colorectal cancer via random walk with restart algorithm.基于重启动随机游走算法推断与结直肠癌相关的新基因。

Gene Ther. 2019 Sep;26(9):373-385. doi: 10.1038/s41434-019-0090-7. Epub 2019 Jul 15.

Identification of genes underlying phenotypic plasticity of wing size via insulin signaling pathway by network-based analysis in Sogatella furcifera.通过网络分析鉴定褐飞虱胰岛素信号通路中翅膀大小表型可塑性的相关基因。

BMC Genomics. 2019 May 21;20(1):396. doi: 10.1186/s12864-019-5793-z.

Measuring phenotype-phenotype similarity through the interactome.通过互作组来测量表型-表型相似性。

BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):114. doi: 10.1186/s12859-018-2102-9.

Optimizing prognosis-related key miRNA-target interactions responsible for cancer metastasis.优化与癌症转移相关的预后关键微小RNA-靶标相互作用。

Oncotarget. 2017 Nov 27;8(65):109522-109535. doi: 10.18632/oncotarget.22724. eCollection 2017 Dec 12.

本文引用的文献

Uncover disease genes by maximizing information flow in the phenome-interactome network.通过最大化表型-互作网络中的信息流来发现疾病基因。

Bioinformatics. 2011 Jul 1;27(13):i167-76. doi: 10.1093/bioinformatics/btr213.

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-12-S1-S11.

Improving disease gene prioritization using the semantic similarity of Gene Ontology terms.利用基因本体论术语的语义相似性提高疾病基因优先级。

Bioinformatics. 2010 Sep 15;26(18):i561-7. doi: 10.1093/bioinformatics/btq384.

Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network.基于异构网络游走的全基因组推断基因-表型关系。

Bioinformatics. 2010 May 1;26(9):1219-24. doi: 10.1093/bioinformatics/btq108. Epub 2010 Mar 9.

Associating genes and protein complexes with disease via network propagation.通过网络传播将基因和蛋白质复合物与疾病相关联。

PLoS Comput Biol. 2010 Jan 15;6(1):e1000641. doi: 10.1371/journal.pcbi.1000641.

BioMart--biological queries made easy.生物集市——轻松进行生物学查询。

BMC Genomics. 2009 Jan 14;10:22. doi: 10.1186/1471-2164-10-22.

Align human interactome with phenome to identify causative genes and networks underlying disease families.将人类相互作用组与表型组进行比对，以识别疾病家族背后的致病基因和网络。

Bioinformatics. 2009 Jan 1;25(1):98-104. doi: 10.1093/bioinformatics/btn593. Epub 2008 Nov 13.

Human Protein Reference Database--2009 update.人类蛋白质参考数据库——2009年更新版

Nucleic Acids Res. 2009 Jan;37(Database issue):D767-72. doi: 10.1093/nar/gkn892. Epub 2008 Nov 6.

McKusick's Online Mendelian Inheritance in Man (OMIM).麦库西克《人类在线孟德尔遗传》（OMIM）。

Nucleic Acids Res. 2009 Jan;37(Database issue):D793-6. doi: 10.1093/nar/gkn665. Epub 2008 Oct 8.

A genomewide functional network for the laboratory mouse.实验室小鼠的全基因组功能网络。

PLoS Comput Biol. 2008 Sep 26;4(9):e1000165. doi: 10.1371/journal.pcbi.1000165.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

构建用于疾病基因推断的基因语义相似性网络。

Constructing a gene semantic similarity network for the inference of disease genes.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献