整合多个蛋白质-蛋白质相互作用网络以优先考虑疾病基因：一种贝叶斯回归方法。

Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach.

机构信息

MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST/Department of Automation, Tsinghua University, Beijing 10084, China.

出版信息

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-12-S1-S11.

DOI:10.1186/1471-2105-12-S1-S11

PMID:21342540

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3044265/

Abstract

BACKGROUND

The identification of genes responsible for human inherited diseases is one of the most challenging tasks in human genetics. Recent studies based on phenotype similarity and gene proximity have demonstrated great success in prioritizing candidate genes for human diseases. However, most of these methods rely on a single protein-protein interaction (PPI) network to calculate similarities between genes, and thus greatly restrict the scope of application of such methods. Meanwhile, independently constructed and maintained PPI networks are usually quite diverse in coverage and quality, making the selection of a suitable PPI network inevitable but difficult.

METHODS

We adopt a linear model to explain similarities between disease phenotypes using gene proximities that are quantified by diffusion kernels of one or more PPI networks. We solve this model via a Bayesian approach, and we derive an analytic form for Bayes factor that naturally measures the strength of association between a query disease and a candidate gene and thus can be used as a score to prioritize candidate genes. This method is intrinsically capable of integrating multiple PPI networks.

RESULTS

We show that gene proximities calculated from PPI networks imply phenotype similarities. We demonstrate the effectiveness of the Bayesian regression approach on five PPI networks via large scale leave-one-out cross-validation experiments and summarize the results in terms of the mean rank ratio of known disease genes and the area under the receiver operating characteristic curve (AUC). We further show the capability of our approach in integrating multiple PPI networks.

CONCLUSIONS

The Bayesian regression approach can achieve much higher performance than the existing CIPHER approach and the ordinary linear regression method. The integration of multiple PPI networks can greatly improve the scope of application of the proposed method in the inference of disease genes.

摘要

背景

鉴定导致人类遗传性疾病的基因是人类遗传学中最具挑战性的任务之一。基于表型相似性和基因邻近性的最新研究已经证明，优先考虑人类疾病候选基因的方法取得了巨大成功。然而，这些方法大多依赖于单个蛋白质-蛋白质相互作用（PPI）网络来计算基因之间的相似性，因此极大地限制了这些方法的应用范围。同时，独立构建和维护的 PPI 网络在覆盖范围和质量上通常差异很大，因此选择合适的 PPI 网络是必不可少的，但也很困难。

方法

我们采用线性模型，使用通过一个或多个 PPI 网络的扩散核量化的基因邻近度来解释疾病表型之间的相似性。我们通过贝叶斯方法求解该模型，并推导出贝叶斯因子的解析形式，该形式自然地衡量了查询疾病与候选基因之间的关联强度，因此可作为优先考虑候选基因的分数。这种方法本质上能够整合多个 PPI 网络。

结果

我们表明，从 PPI 网络计算出的基因邻近度暗示了表型相似性。我们通过大规模的留一法交叉验证实验展示了贝叶斯回归方法在五个 PPI 网络上的有效性，并以已知疾病基因的平均秩比和接收者操作特征曲线（ROC）下的面积（AUC）来总结结果。我们进一步展示了我们的方法整合多个 PPI 网络的能力。

结论

贝叶斯回归方法可以比现有的 CIPHER 方法和普通线性回归方法实现更高的性能。多个 PPI 网络的整合可以极大地提高所提出方法在推断疾病基因方面的应用范围。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ab87/3044265/bc74dc667e0c/1471-2105-12-S1-S11-2.jpg

相似文献

Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach.整合多个蛋白质-蛋白质相互作用网络以优先考虑疾病基因：一种贝叶斯回归方法。

BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-12-S1-S11.

Constructing a gene semantic similarity network for the inference of disease genes.构建用于疾病基因推断的基因语义相似性网络。

BMC Syst Biol. 2011;5 Suppl 2(Suppl 2):S2. doi: 10.1186/1752-0509-5-S2-S2. Epub 2011 Dec 14.

Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities.解剖学本体数据与蛋白质-蛋白质相互作用网络的整合提高了解剖实体候选基因预测的准确性。

BMC Bioinformatics. 2020 Oct 7;21(1):442. doi: 10.1186/s12859-020-03773-2.

Prioritization of potential candidate disease genes by topological similarity of protein-protein interaction network and phenotype data.通过蛋白质-蛋白质相互作用网络和表型数据的拓扑相似性对潜在候选疾病基因进行优先级排序。

J Biomed Inform. 2015 Feb;53:229-36. doi: 10.1016/j.jbi.2014.11.004. Epub 2014 Nov 15.

DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases.DomainRBF：一种用于复杂疾病候选结构域优先级排序的贝叶斯回归方法。

BMC Syst Biol. 2011 Apr 19;5:55. doi: 10.1186/1752-0509-5-55.

Constructing an integrated gene similarity network for the identification of disease genes.构建用于疾病基因识别的综合基因相似性网络。

J Biomed Semantics. 2017 Sep 20;8(Suppl 1):32. doi: 10.1186/s13326-017-0141-1.

Prioritizing disease genes with an improved dual label propagation framework.利用改进的双重标签传播框架优先考虑疾病基因。

BMC Bioinformatics. 2018 Feb 8;19(1):47. doi: 10.1186/s12859-018-2040-6.

Predicting diabetes mellitus genes via protein-protein interaction and protein subcellular localization information.通过蛋白质-蛋白质相互作用和蛋白质亚细胞定位信息预测糖尿病基因。

BMC Genomics. 2016 Aug 18;17 Suppl 4(Suppl 4):433. doi: 10.1186/s12864-016-2795-y.

DIGNiFI: Discovering causative genes for orphan diseases using protein-protein interaction networks.DIGNiFI：利用蛋白质-蛋白质相互作用网络发现罕见病的致病基因。

BMC Syst Biol. 2017 Mar 14;11(Suppl 3):23. doi: 10.1186/s12918-017-0402-8.

Completing sparse and disconnected protein-protein network by deep learning.通过深度学习填补稀疏且不连续的蛋白质-蛋白质网络。

BMC Bioinformatics. 2018 Mar 22;19(1):103. doi: 10.1186/s12859-018-2112-7.

引用本文的文献

Identification of copper-related biomarkers and potential molecule mechanism in diabetic nephropathy.鉴定糖尿病肾病相关的铜生物标志物及潜在分子机制

Front Endocrinol (Lausanne). 2022 Oct 18;13:978601. doi: 10.3389/fendo.2022.978601. eCollection 2022.

Gene Network Analysis of Alzheimer's Disease Based on Network and Statistical Methods.基于网络和统计方法的阿尔茨海默病基因网络分析

Entropy (Basel). 2021 Oct 19;23(10):1365. doi: 10.3390/e23101365.

The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets.2021 年的 STRING 数据库：可定制的蛋白质-蛋白质网络，以及用户上传的基因/测量集的功能特征分析。

Nucleic Acids Res. 2021 Jan 8;49(D1):D605-D612. doi: 10.1093/nar/gkaa1074.

A network-based integrated framework for predicting virus-prokaryote interactions.一种基于网络的预测病毒与原核生物相互作用的综合框架。

NAR Genom Bioinform. 2020 Jun;2(2):lqaa044. doi: 10.1093/nargab/lqaa044. Epub 2020 Jun 23.

Genome-wide functional association networks: background, data & state-of-the-art resources.全基因组功能关联网络：背景、数据和最新资源。

Brief Bioinform. 2020 Jul 15;21(4):1224-1237. doi: 10.1093/bib/bbz064.

The integrated landscape of causal genes and pathways in schizophrenia.精神分裂症中因果基因与通路的综合景观。

Transl Psychiatry. 2018 Mar 15;8(1):67. doi: 10.1038/s41398-018-0114-x.

Constructing an integrated gene similarity network for the identification of disease genes.构建用于疾病基因识别的综合基因相似性网络。

J Biomed Semantics. 2017 Sep 20;8(Suppl 1):32. doi: 10.1186/s13326-017-0141-1.

A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization.用于基因优先级排序的疾病表型网络综合评估

PLoS One. 2016 Jul 14;11(7):e0159457. doi: 10.1371/journal.pone.0159457. eCollection 2016.

A fast and high performance multiple data integration algorithm for identifying human disease genes.一种用于识别人类疾病基因的快速高效多数据整合算法。

BMC Med Genomics. 2015;8 Suppl 3(Suppl 3):S2. doi: 10.1186/1755-8794-8-S3-S2. Epub 2015 Sep 23.

Pathway mapping and development of disease-specific biomarkers: protein-based network biomarkers.疾病特异性生物标志物的通路映射与开发：基于蛋白质的网络生物标志物

J Cell Mol Med. 2015 Feb;19(2):297-314. doi: 10.1111/jcmm.12447. Epub 2015 Jan 5.

本文引用的文献

Prioritisation of associations between protein domains and complex diseases using domain-domain interaction networks.利用蛋白质结构域相互作用网络对蛋白质结构域与复杂疾病之间的关联进行优先级排序。

IET Syst Biol. 2010 May;4(3):212-22. doi: 10.1049/iet-syb.2009.0037.

Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network.基于异构网络游走的全基因组推断基因-表型关系。

Bioinformatics. 2010 May 1;26(9):1219-24. doi: 10.1093/bioinformatics/btq108. Epub 2010 Mar 9.

MINT, the molecular interaction database: 2009 update.MINT，分子相互作用数据库：2009 年更新。

Nucleic Acids Res. 2010 Jan;38(Database issue):D532-9. doi: 10.1093/nar/gkp983. Epub 2009 Nov 6.

The IntAct molecular interaction database in 2010.2010 年的 IntAct 分子相互作用数据库。

Nucleic Acids Res. 2010 Jan;38(Database issue):D525-31. doi: 10.1093/nar/gkp878. Epub 2009 Oct 22.

BioMart--biological queries made easy.生物集市——轻松进行生物学查询。

BMC Genomics. 2009 Jan 14;10:22. doi: 10.1186/1471-2164-10-22.

Align human interactome with phenome to identify causative genes and networks underlying disease families.将人类相互作用组与表型组进行比对，以识别疾病家族背后的致病基因和网络。

Bioinformatics. 2009 Jan 1;25(1):98-104. doi: 10.1093/bioinformatics/btn593. Epub 2008 Nov 13.

Human Protein Reference Database--2009 update.人类蛋白质参考数据库——2009年更新版

Nucleic Acids Res. 2009 Jan;37(Database issue):D767-72. doi: 10.1093/nar/gkn892. Epub 2008 Nov 6.

Network-based global inference of human disease genes.基于网络的人类疾病基因全局推断

Mol Syst Biol. 2008;4:189. doi: 10.1038/msb.2008.27. Epub 2008 May 6.

Walking the interactome for prioritization of candidate disease genes.遍历相互作用组以对候选疾病基因进行优先级排序。

Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013. Epub 2008 Mar 27.

Phenome connections.表型关联

Trends Genet. 2008 Mar;24(3):103-6. doi: 10.1016/j.tig.2007.12.005. Epub 2008 Feb 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

整合多个蛋白质-蛋白质相互作用网络以优先考虑疾病基因：一种贝叶斯回归方法。

Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献