Suppr超能文献

使用马尔可夫逻辑网络进行基因归一化阶段和共指解析的集成。

Integration of gene normalization stages and co-reference resolution using a Markov logic network.

机构信息

Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan, ROC.

出版信息

Bioinformatics. 2011 Sep 15;27(18):2586-94. doi: 10.1093/bioinformatics/btr358. Epub 2011 Jun 17.

Abstract

MOTIVATION

Gene normalization (GN) is the task of normalizing a textual gene mention to a unique gene database ID. Traditional top performing GN systems usually need to consider several constraints to make decisions in the normalization process, including filtering out false positives, or disambiguating an ambiguous gene mention, to improve system performance. However, these constraints are usually executed in several separate stages and cannot use each other's input/output interactively. In this article, we propose a novel approach that employs a Markov logic network (MLN) to model the constraints used in the GN task. Firstly, we show how various constraints can be formulated and combined in an MLN. Secondly, we are the first to apply the two main concepts of co-reference resolution-discourse salience in centering theory and transitivity-to GN models. Furthermore, to make our results more relevant to developers of information extraction applications, we adopt the instance-based precision/recall/F-measure (PRF) in addition to the article-wide PRF to assess system performance.

RESULTS

Experimental results show that our system outperforms baseline and state-of-the-art systems under two evaluation schemes. Through further analysis, we have found several unexplored challenges in the GN task.

CONTACT

hongjie@iis.sinica.edu.tw

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

基因归一化(GN)的任务是将文本基因提及归一化为唯一的基因数据库 ID。传统的性能最高的 GN 系统通常需要考虑几个约束条件来在归一化过程中做出决策,包括过滤掉假阳性或消除歧义的基因提及,以提高系统性能。然而,这些约束条件通常在几个单独的阶段执行,并且不能相互交互地使用彼此的输入/输出。在本文中,我们提出了一种新的方法,该方法使用马尔可夫逻辑网络(MLN)来对 GN 任务中使用的约束进行建模。首先,我们展示了如何在 MLN 中对各种约束进行公式化和组合。其次,我们首次将共指消解-中心理论中的话语突显和传递性这两个主要概念应用于 GN 模型。此外,为了使我们的结果与信息提取应用程序的开发人员更相关,我们采用基于实例的精度/召回/F 度量(PRF),除了文章范围内的 PRF 来评估系统性能。

结果

实验结果表明,我们的系统在两种评估方案下均优于基线和最先进的系统。通过进一步分析,我们发现 GN 任务中存在几个尚未探索的挑战。

联系信息

hongjie@iis.sinica.edu.tw

补充信息

补充数据可在生物信息学在线获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验