应用基于语义的概率上下文无关语法进行医学语言处理——解析药物句子的初步研究。

Applying semantic-based probabilistic context-free grammar to medical language processing--a preliminary study on parsing medication sentences.

机构信息

Department of Biomedical Informatics, Vanderbilt University, School of Medicine, Nashville, TN 37232, USA.

出版信息

J Biomed Inform. 2011 Dec;44(6):1068-75. doi: 10.1016/j.jbi.2011.08.009. Epub 2011 Aug 12.

Abstract

Semantic-based sublanguage grammars have been shown to be an efficient method for medical language processing. However, given the complexity of the medical domain, parsers using such grammars inevitably encounter ambiguous sentences, which could be interpreted by different groups of production rules and consequently result in two or more parse trees. One possible solution, which has not been extensively explored previously, is to augment productions in medical sublanguage grammars with probabilities to resolve the ambiguity. In this study, we associated probabilities with production rules in a semantic-based grammar for medication findings and evaluated its performance on reducing parsing ambiguity. Using the existing data set from 2009 i2b2 NLP (Natural Language Processing) challenge for medication extraction, we developed a semantic-based CFG (Context Free Grammar) for parsing medication sentences and manually created a Treebank of 4564 medication sentences from discharge summaries. Using the Treebank, we derived a semantic-based PCFG (Probabilistic Context Free Grammar) for parsing medication sentences. Our evaluation using a 10-fold cross validation showed that the PCFG parser dramatically improved parsing performance when compared to the CFG parser.

摘要

基于语义的子语言语法已被证明是一种有效的医学语言处理方法。然而,考虑到医学领域的复杂性,使用这种语法的解析器不可避免地会遇到歧义句,这些句子可以由不同的产生规则组来解释,从而导致两个或更多的解析树。一个可能的解决方案是,在医学子语言语法的产生规则中添加概率来解决歧义。在这项研究中,我们在基于语义的药物发现语法中为产生规则关联了概率,并评估了其在减少解析歧义方面的性能。我们使用了 2009 年 i2b2 NLP(自然语言处理)挑战赛中现有的药物提取数据集,为解析药物句子开发了基于语义的 CFG(上下文无关语法),并从出院小结中手动创建了一个包含 4564 个药物句子的 Treebank。使用 Treebank,我们为解析药物句子推导了基于语义的 PCFG(概率上下文无关语法)。我们使用 10 折交叉验证进行的评估表明,与 CFG 解析器相比,PCFG 解析器显著提高了解析性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索