Suppr超能文献

一种从生物医学文献中提取关系的混合方法。

A hybrid method for relation extraction from biomedical literature.

作者信息

Huang Minlie, Zhu Xiaoyan, Li Ming

机构信息

State Key Laboratory of Intelligent Technology and Systems (LITS), Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China.

出版信息

Int J Med Inform. 2006 Jun;75(6):443-55. doi: 10.1016/j.ijmedinf.2005.06.010. Epub 2005 Aug 10.

Abstract

PURPOSE

Over recent years, there has been a growing interest in extracting entities and relations from biomedical literature. There are a vast number of systems and approaches being proposed to extract biological relations, but none of them achieves satisfactory results. These methodologies are either parsing-based or pattern-based, which are not competent to handle the grammatical complexities of biomedical texts, or too complicated to be adapted. It is well known that appositive, coordinative propositions and such grammatical structures are extremely common in biomedical texts, particularly in full texts. However, these problems are still untouched for most of researchers.

METHODS

In this paper, we have proposed a new approach, which is hybrid with both shallow parsing and pattern matching, to extract relations between proteins from scientific papers of biomedical themes. In the method, appositive and coordinative structures are interpreted based on the shallow parsing analysis, with both syntactic and semantic constraints. Then long sentences are splitted into sub-ones, from which relations are extracted by a greedy pattern matching algorithm, along with automatically generated patterns.

RESULTS

Our approach is experimented to extract protein-protein interactions from full biomedical texts, and has achieved an average F-score of 80% on individual verbs, and 66% on all verbs. With the help of shallow parsing analysis, pattern matching is improved remarkably. Compared with the traditional pattern matching algorithm, our approach achieves about 7% improvement of both precision and F-score. In contrast to other systems, our approach achieves performance comparable to the best. A demo system has been available at http://spies.cs.tsinghua.edu.cn.

摘要

目的

近年来,从生物医学文献中提取实体和关系的研究兴趣日益浓厚。目前已有大量用于提取生物关系的系统和方法被提出,但均未取得令人满意的结果。这些方法要么基于句法分析,要么基于模式匹配,无法处理生物医学文本的语法复杂性,或者过于复杂难以应用。众所周知,同位语、并列结构等语法结构在生物医学文本中极为常见,尤其是在全文中。然而,大多数研究人员仍未触及这些问题。

方法

在本文中,我们提出了一种新方法,它将浅层句法分析和模式匹配相结合,用于从生物医学主题的科学论文中提取蛋白质之间的关系。该方法基于浅层句法分析,结合句法和语义约束来解释同位语和并列结构。然后将长句子拆分为短句子,通过贪心模式匹配算法以及自动生成的模式从中提取关系。

结果

我们的方法经过实验,用于从完整的生物医学文本中提取蛋白质 - 蛋白质相互作用,在单个动词上平均F值达到80%,在所有动词上达到66%。借助浅层句法分析,模式匹配得到了显著改进。与传统模式匹配算法相比,我们的方法在精确率和F值上均提高了约7%。与其他系统相比,我们的方法性能与最佳系统相当。一个演示系统可在http://spies.cs.tsinghua.edu.cn获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验