Suppr超能文献

使用超图从生物文本文档中进行多方面关联提取和可视化:在疾病的遗传关联研究中的应用。

Multi-way association extraction and visualization from biological text documents using hyper-graphs: applications to genetic association studies for diseases.

机构信息

Department of Computer and Information Science, Indiana University Purdue University Indianapolis, 723 West Michigan Street SL 280J, Indianapolis, IN 46202, USA.

出版信息

Artif Intell Med. 2010 Jul;49(3):145-54. doi: 10.1016/j.artmed.2010.03.002. Epub 2010 Apr 9.

Abstract

OBJECTIVES

Biological research literature, as in many other domains of human endeavor, represents a rich, ever growing source of knowledge. An important form of such biological knowledge constitutes associations among biological entities such as genes, proteins, diseases, drugs and chemicals, etc. There has been a considerable amount of recent research in extraction of various kinds of binary associations (e.g., gene-gene, gene-protein, protein-protein, etc.) using different text mining approaches. However, an important aspect of such associations (e.g., "gene A activates protein B") is identifying the context in which such associations occur (e.g., "gene A activates protein B in the context of disease C in organ D under the influence of chemical E"). Such contexts can be represented appropriately by a multi-way relationship involving more than two objects (e.g., objects A, B, C, D, E) rather than usual binary relationship (objects A and B).

METHODS

Such multi-way relations naturally lead to a hyper-graph representation of the knowledge rather than a binary graph. The hyper-graph based multi-way knowledge extraction from biological text literature represents a computationally difficult problem (due to its combinatorial nature) which has not received much attention from the Bioinformatics research community. In this paper, we describe and compare two different approaches to such multi-way hyper-graph extraction: one based on an exhaustive enumeration of all multi-way hyper-edges and the other based on an extension of the well-known A Priori algorithm for structured data to the case unstructured textual data. We also present a representative graph based approach towards visualizing these genetic association hyper-graphs.

RESULTS

Two case studies are conducted for two biomedical problems (related to the diseases of lung cancer and colorectal cancer respectively), illustrating that the latter approach (using the text-based A Priori method) identifies the same hyper-edges as the former approach (the exhaustive method), but at a much less computational cost. The extracted hyper-relations are presented in the paper as cognition-rich representative graphs, representing the corresponding hyper-graphs.

CONCLUSIONS

The text-based A Priori algorithm is a practical, useful method to extract hyper-graphs representing multi-way associations among biological objects. These hyper-graphs and their visualization using representative graphs can provide important contextual information for understanding gene-gene associations relevant to specific diseases.

摘要

目的

生物研究文献,与人类活动的许多其他领域一样,是一个丰富且不断增长的知识来源。此类生物知识的一个重要形式是对生物实体(如基因、蛋白质、疾病、药物和化学物质等)之间的关联进行建模。最近已经有相当多的研究致力于使用不同的文本挖掘方法提取各种类型的二元关联(例如基因-基因、基因-蛋白质、蛋白质-蛋白质等)。然而,此类关联的一个重要方面(例如“基因 A 激活蛋白质 B”)是确定此类关联发生的上下文(例如“基因 A 在器官 D 中疾病 C 的背景下激活蛋白质 B 在化学物质 E 的影响下”)。这种上下文可以通过涉及两个以上对象(例如对象 A、B、C、D、E)的多向关系来适当表示,而不是通常的二元关系(对象 A 和 B)。

方法

此类多向关系自然导致了知识的超图表示,而不是二元图。基于超图的多向生物文本文献知识提取是一个计算上困难的问题(由于其组合性质),尚未得到生物信息学研究界的太多关注。在本文中,我们描述并比较了两种不同的多向超图提取方法:一种基于所有多向超边的穷举枚举,另一种基于针对结构化数据的知名 A Priori 算法扩展到非结构化文本数据的情况。我们还提出了一种基于代表性图的方法,用于可视化这些遗传关联超图。

结果

针对两个生物医学问题(分别与肺癌和结直肠癌有关)进行了两项案例研究,结果表明,后一种方法(使用基于文本的 A Priori 方法)可以识别与前一种方法(穷举方法)相同的超边,但计算成本要低得多。提取的超关系在本文中作为认知丰富的代表性图呈现,代表相应的超图。

结论

基于文本的 A Priori 算法是提取表示生物对象之间多向关联的超图的实用且有用的方法。这些超图及其使用代表性图进行的可视化可以为理解与特定疾病相关的基因-基因关联提供重要的上下文信息。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验