Computer Science Department, University Carlos III of Madrid, Leganés, 28921, Spain.
BMC Bioinformatics. 2010 Apr 16;11 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-11-S2-S1.
Drug-drug interactions are frequently reported in the increasing amount of biomedical literature. Information Extraction (IE) techniques have been devised as a useful instrument to manage this knowledge. Nevertheless, IE at the sentence level has a limited effect because of the frequent references to previous entities in the discourse, a phenomenon known as 'anaphora'. DrugNerAR, a drug anaphora resolution system is presented to address the problem of co-referring expressions in pharmacological literature. This development is part of a larger and innovative study about automatic drug-drug interaction extraction.
The system uses a set of linguistic rules drawn by Centering Theory over the analysis provided by a biomedical syntactic parser. Semantic information provided by the Unified Medical Language System (UMLS) is also integrated in order to improve the recognition and the resolution of nominal drug anaphors. Besides, a corpus has been developed in order to analyze the phenomena and evaluate the current approach. Each possible case of anaphoric expression was looked into to determine the most effective way of resolution.
An F-score of 0.76 in anaphora resolution was achieved, outperforming significantly the baseline by almost 73%. This ad-hoc reference line was developed to check the results as there is no previous work on anaphora resolution in pharmacological documents. The obtained results resemble those found in related-semantic domains.
The present approach shows very promising results in the challenge of accounting for anaphoric expressions in pharmacological texts. DrugNerAr obtains similar results to other approaches dealing with anaphora resolution in the biomedical domain, but, unlike these approaches, it focuses on documents reflecting drug interactions. The Centering Theory has proved being effective at the selection of antecedents in anaphora resolution. A key component in the success of this framework is the analysis provided by the MMTx program and the DrugNer system that allows to deal with the complexity of the pharmacological language. It is expected that the positive results of the resolver increases performance of our future drug-drug interaction extraction system.
药物-药物相互作用在日益增多的生物医学文献中经常被报道。信息提取 (IE) 技术已被设计为管理这种知识的有用工具。然而,由于话语中经常引用先前的实体,句子级别的 IE 效果有限,这种现象称为“回指”。为了解决药理学文献中共同引用表达式的问题,提出了一种药物回指解析系统 DrugNerAR。这项开发是关于自动药物-药物相互作用提取的更大创新研究的一部分。
该系统使用一组基于中心理论的语言规则,对生物医学句法分析器提供的分析进行处理。还集成了统一医学语言系统 (UMLS) 的语义信息,以提高对名词药物回指的识别和解析。此外,还开发了一个语料库来分析现象并评估当前方法。对于每个可能的回指表达案例,都进行了研究,以确定最有效的解析方法。
回指解析的 F 分数达到 0.76,比基线高出近 73%。由于没有以前在药理学文献中解决回指问题的工作,因此开发了这个特定的参考线来检查结果。获得的结果与在相关语义领域中找到的结果相似。
在解决药理学文本中回指表达的问题时,当前方法显示出非常有前途的结果。DrugNerAr 在处理生物医学领域中的回指解析方面取得了与其他方法相似的结果,但与这些方法不同的是,它专注于反映药物相互作用的文档。中心理论已被证明在回指解析中选择先行词方面是有效的。该框架成功的关键组成部分是 MMTx 程序和 DrugNer 系统提供的分析,这使得我们能够处理药理学语言的复杂性。预计解析器的积极结果将提高我们未来药物-药物相互作用提取系统的性能。