Suppr超能文献

综合生物实体网络:一个用于生物知识发现的系统。

Integrated bio-entity network: a system for biological knowledge discovery.

机构信息

Department of Statistics, Florida State University, Tallahassee, Florida, United States of America.

出版信息

PLoS One. 2011;6(6):e21474. doi: 10.1371/journal.pone.0021474. Epub 2011 Jun 27.

Abstract

A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein-protein interactions, protein/gene regulations, protein-small molecule interactions, protein-GO relationships, protein-pathway relationships, and pathway-disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses--the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs.

摘要

我们的生物知识很大一部分集中在生物实体(bio-entities)之间的关系上,例如蛋白质、基因、小分子、途径、基因本体论(GO)术语和疾病。生物实体关系的信息以越来越快的速度积累,并以不同的形式存档在分散的地方。此类信息中的大部分都隐藏在科学文献的非结构化文本中。以结构化形式组织异构信息不仅有助于使用综合方法研究生物系统,而且还可以以自动和系统的方式发现新知识。在这项研究中,我们从包含手动注释的数据库和从科学文献中的非结构化文本进行自动信息提取的数据库中,大规模整合了生物实体关系信息。我们在这项研究中整合的关系信息包括蛋白质-蛋白质相互作用、蛋白质/基因调控、蛋白质-小分子相互作用、蛋白质-GO 关系、蛋白质途径关系和途径-疾病关系。关系信息组织在一个名为集成生物实体网络(IBN)的图数据结构中,其中顶点是生物实体,边表示它们的关系。在这个框架下,可以设计图论算法来执行各种知识发现任务。我们设计了带有剪枝的广度优先搜索(BFSP)和最可能路径(MPP)算法,以自动生成网络中具有高概率的间接关系的假设。我们表明,IBN 可用于生成合理的假设,这不仅有助于更好地理解生物系统中的复杂相互作用,而且还为实验设计提供了指导。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c03/3124513/0bc65daed3ab/pone.0021474.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验