Suppr超能文献

从文本中学习本体论规则以提取基因相互作用的多种关系。

Learning ontological rules to extract multiple relations of genic interactions from text.

机构信息

LIPN, Université Paris 13/CNRS UMR7030, Laboratoire d'Informatique Paris-Nord, Institut Galilée, Université Paris 13, Villetaneuse, France.

出版信息

Int J Med Inform. 2009 Dec;78(12):e31-8. doi: 10.1016/j.ijmedinf.2009.03.005. Epub 2009 Apr 23.

Abstract

INTRODUCTION

Information extraction (IE) systems have been proposed in recent years to extract genic interactions from bibliographical resources. They are limited to single interaction relations, and have to face a trade-off between recall and precision, by focusing either on specific interactions (for precision), or general and unspecified interactions of biological entities (for recall). Yet, biologists need to process more complex data from literature, in order to study biological pathways. An ontology is an adequate formal representation to model this sophisticated knowledge. However, the tight integration of IE systems and ontologies is still a current research issue, a fortiori with complex ones that go beyond hierarchies.

METHOD

We propose a rich modeling of genic interactions with an ontology, and show how it can be used within an IE system. The ontology is seen as a language specifying a normalized representation of text. First, IE is performed by extracting instances from natural language processing (NLP) modules. Then, deductive inferences on the ontology language are completed, and new instances are derived from previously extracted ones. Inference rules are learnt with an inductive logic programming (ILP) algorithm, using the ontology as the hypothesis language, and its instantiation on an annotated corpus as the example language. Learning is set in a multi-class setting to deal with the multiple ontological relations.

RESULTS

We validated our approach on an annotated corpus of gene transcription regulations in the Bacillus subtilis bacterium. We reach a global recall of 89.3% and a precision of 89.6%, with high scores for the ten semantic relations defined in the ontology.

摘要

简介

近年来,信息提取(IE)系统已被提出,用于从文献资源中提取基因相互作用。它们仅限于单个相互作用关系,并且必须在召回率和精度之间进行权衡,要么专注于特定的相互作用(用于精度),要么关注生物实体的一般和未指定的相互作用(用于召回率)。然而,生物学家需要处理来自文献的更复杂的数据,以研究生物途径。本体论是对这种复杂知识进行建模的一种合适的形式化表示。然而,IE 系统与本体论的紧密集成仍然是一个当前的研究问题,尤其是对于超越层次结构的复杂本体论更是如此。

方法

我们提出了一种丰富的基因相互作用模型,使用本体论,并展示了如何在 IE 系统中使用它。本体论被视为一种语言,指定文本的规范化表示。首先,通过从自然语言处理(NLP)模块中提取实例来执行 IE。然后,在本体论语言上完成演绎推理,并从之前提取的实例中推导出新实例。推理规则是使用归纳逻辑编程(ILP)算法学习的,本体论用作假设语言,其在带注释语料库上的实例化用作示例语言。学习设置在多类设置中,以处理多个本体论关系。

结果

我们在枯草芽孢杆菌基因转录调控的带注释语料库上验证了我们的方法。我们达到了 89.3%的全局召回率和 89.6%的精度,本体论中定义的十个语义关系的得分都很高。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验