Biomine：使用异构数据库的网络模型预测生物实体之间的联系。

Biomine: predicting links between biological entities using network models of heterogeneous databases.

机构信息

Biocomputing Platforms Ltd, Innopoli 2, Tekniikantie 14, , FI-02150 Espoo, Finland.

出版信息

BMC Bioinformatics. 2012 Jun 6;13:119. doi: 10.1186/1471-2105-13-119.

DOI:10.1186/1471-2105-13-119

PMID:22672646

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3505483/

Abstract

BACKGROUND

Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases.

RESULTS

Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes.

CONCLUSIONS

The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.

摘要

背景

生物数据库包含大量关于基因和蛋白质功能及关联的信息。将来自多个此类数据库的数据整合到一个单一的存储库中，可以帮助发现以前未知的跨越多种关系和数据库的连接。

结果

Biomine 是一个系统，它将来自几个生物数据库的交叉引用整合到一个具有多种类型边的图形模型中，例如蛋白质相互作用、基因-疾病关联和基因本体论注释。边根据其类型、可靠性和信息量进行加权。我们介绍了 Biomine 并评估了它在链接预测中的性能，链接预测的目标是根据当前数据预测未来将连接的节点对。具体来说，我们将蛋白质相互作用预测和疾病基因优先级任务作为链接预测的实例。预测是基于在整合图上计算的接近度度量得出的。我们考虑并实验了几种这样的度量标准，并执行了一个参数优化过程，其中对不同的边类型进行加权以优化链接预测的准确性。我们还提出了一种新的疾病基因优先级排序方法，即将候选基因集中在一起形成一个聚类。我们通过预测源数据库中的未来注释和优先排序候选基因列表来实验评估 Biomine。

结论

实验结果表明，当有一组选定的候选链接时，Biomine 具有很强的链接预测能力。当适当加权不同类型的链接时，使用整个 Biomine 数据集获得的预测明显优于仅使用单个数据源获得的预测。在基因优先级排序任务中，一个已建立的疾病相关基因参考集是有用的，但结果表明，在有利条件下，当没有此类信息可用时，Biomine 也可以表现良好。Biomine 系统是一个概念验证。它的当前版本包含 110 万个实体和它们之间的 810 万个关系，重点是人类遗传学。其部分功能可在公共查询界面 http://biomine.cs.helsinki.fi 上使用，允许搜索和可视化给定生物实体之间的连接。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3b57/3505483/15fb4a01afff/1471-2105-13-119-1.jpg

相似文献

Biomine: predicting links between biological entities using network models of heterogeneous databases.

BMC Bioinformatics. 2012 Jun 6;13:119. doi: 10.1186/1471-2105-13-119.

Novel semantic similarity measure improves an integrative approach to predicting gene functional associations.

BMC Syst Biol. 2013 Mar 14;7:22. doi: 10.1186/1752-0509-7-22.

Interactive exploration of heterogeneous biological networks with Biomine Explorer.

Bioinformatics. 2019 Dec 15;35(24):5385-5388. doi: 10.1093/bioinformatics/btz509.

NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab080.

ProphNet: a generic prioritization method through propagation of information.

BMC Bioinformatics. 2014;15 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-15-S1-S5. Epub 2014 Jan 10.

cMapper: gene-centric connectivity mapper for EBI-RDF platform.

Bioinformatics. 2017 Jan 15;33(2):266-271. doi: 10.1093/bioinformatics/btw612. Epub 2016 Sep 25.

Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes.

PLoS Comput Biol. 2015 Jul 9;11(7):e1004259. doi: 10.1371/journal.pcbi.1004259. eCollection 2015 Jul.

Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities.

BMC Bioinformatics. 2020 Oct 7;21(1):442. doi: 10.1186/s12859-020-03773-2.

HPO2Vec+: Leveraging heterogeneous knowledge resources to enrich node embeddings for the Human Phenotype Ontology.

J Biomed Inform. 2019 Aug;96:103246. doi: 10.1016/j.jbi.2019.103246. Epub 2019 Jun 27.

SACMDA: MiRNA-Disease Association Prediction with Short Acyclic Connections in Heterogeneous Graph.

Neuroinformatics. 2018 Oct;16(3-4):373-382. doi: 10.1007/s12021-018-9373-1.

引用本文的文献

RNA Sequencing Analyses for Deciphering Potato Molecular Responses.

Methods Mol Biol. 2021;2354:57-94. doi: 10.1007/978-1-0716-1609-3_3.

Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest.

Front Res Metr Anal. 2021 Jun 25;6:644728. doi: 10.3389/frma.2021.644728. eCollection 2021.

Indirect association and ranking hypotheses for literature based discovery.

BMC Bioinformatics. 2019 Aug 15;20(1):425. doi: 10.1186/s12859-019-2989-9.

Interactive exploration of heterogeneous biological networks with Biomine Explorer.

Bioinformatics. 2019 Dec 15;35(24):5385-5388. doi: 10.1093/bioinformatics/btz509.

Predicting disease-related genes using integrated biomedical networks.

BMC Genomics. 2017 Jan 25;18(Suppl 1):1043. doi: 10.1186/s12864-016-3263-4.

Representing and querying disease networks using graph databases.

BioData Min. 2016 Jul 25;9:23. doi: 10.1186/s13040-016-0102-8. eCollection 2016.

OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization.

Bioinformatics. 2015 Dec 1;31(23):3822-9. doi: 10.1093/bioinformatics/btv473. Epub 2015 Aug 12.

Analysis of Glioblastoma Patients' Plasma Revealed the Presence of MicroRNAs with a Prognostic Impact on Survival and Those of Viral Origin.

PLoS One. 2015 May 7;10(5):e0125791. doi: 10.1371/journal.pone.0125791. eCollection 2015.

Expression analysis of all protease genes reveals cathepsin K to be overexpressed in glioblastoma.

PLoS One. 2014 Oct 30;9(10):e111819. doi: 10.1371/journal.pone.0111819. eCollection 2014.

CoIN: a network analysis for document triage.

Database (Oxford). 2013 Nov 11;2013:bat076. doi: 10.1093/database/bat076. Print 2013.

本文引用的文献

A guide to web tools to prioritize candidate genes.

Brief Bioinform. 2011 Jan;12(1):22-32. doi: 10.1093/bib/bbq007. Epub 2010 Mar 21.

Network medicine: a network-based approach to human disease.

Nat Rev Genet. 2011 Jan;12(1):56-68. doi: 10.1038/nrg2918.

The power of protein interaction networks for associating genes with diseases.

Bioinformatics. 2010 Apr 15;26(8):1057-63. doi: 10.1093/bioinformatics/btq076. Epub 2010 Feb 24.

Associating genes and protein complexes with disease via network propagation.

PLoS Comput Biol. 2010 Jan 15;6(1):e1000641. doi: 10.1371/journal.pcbi.1000641.

Database resources of the National Center for Biotechnology Information.

Nucleic Acids Res. 2010 Jan;38(Database issue):D5-16. doi: 10.1093/nar/gkp967. Epub 2009 Nov 12.

KEGG for representation and analysis of molecular networks involving diseases and drugs.

Nucleic Acids Res. 2010 Jan;38(Database issue):D355-60. doi: 10.1093/nar/gkp896. Epub 2009 Oct 30.

The Universal Protein Resource (UniProt) in 2010.

Nucleic Acids Res. 2010 Jan;38(Database issue):D142-8. doi: 10.1093/nar/gkp846. Epub 2009 Oct 20.

Genome-wide prioritization of disease genes and identification of disease-disease associations from an integrated human functional linkage network.

Genome Biol. 2009;10(9):R91. doi: 10.1186/gb-2009-10-9-r91. Epub 2009 Sep 3.

Assessing reliability of protein-protein interactions by integrative analysis of data in model organisms.

BMC Bioinformatics. 2009 Apr 29;10 Suppl 4(Suppl 4):S5. doi: 10.1186/1471-2105-10-S4-S5.

An open access database of genome-wide association results.

BMC Med Genet. 2009 Jan 22;10:6. doi: 10.1186/1471-2350-10-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Biomine：使用异构数据库的网络模型预测生物实体之间的联系。

Biomine: predicting links between biological entities using network models of heterogeneous databases.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献