Suppr超能文献

生物医学领域的语义角色标注的领域自适应。

Domain adaptation for semantic role labeling in the biomedical domain.

机构信息

NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456, Singapore.

出版信息

Bioinformatics. 2010 Apr 15;26(8):1098-104. doi: 10.1093/bioinformatics/btq075. Epub 2010 Feb 23.

Abstract

MOTIVATION

Semantic role labeling (SRL) is a natural language processing (NLP) task that extracts a shallow meaning representation from free text sentences. Several efforts to create SRL systems for the biomedical domain have been made during the last few years. However, state-of-the-art SRL relies on manually annotated training instances, which are rare and expensive to prepare. In this article, we address SRL for the biomedical domain as a domain adaptation problem to leverage existing SRL resources from the newswire domain.

RESULTS

We evaluate the performance of three recently proposed domain adaptation algorithms for SRL. Our results show that by using domain adaptation, the cost of developing an SRL system for the biomedical domain can be reduced significantly. Using domain adaptation, our system can achieve 97% of the performance with as little as 60 annotated target domain abstracts.

AVAILABILITY

Our BioKIT system that performs SRL in the biomedical domain as described in this article is implemented in Python and C and operates under the Linux operating system. BioKIT can be downloaded at http://nlp.comp.nus.edu.sg/software. The domain adaptation software is available for download at http://www.mysmu.edu/faculty/jingjiang/software/DALR.html. The BioProp corpus is available from the Linguistic Data Consortium http://www.ldc.upenn.edu.

摘要

动机

语义角色标注(SRL)是一种自然语言处理(NLP)任务,它从自由文本句子中提取出浅层的语义表示。在过去的几年中,已经有几项针对生物医学领域的 SRL 系统的创建工作。然而,最新的 SRL 依赖于手动标注的训练实例,这些实例很少且准备起来很昂贵。在本文中,我们将生物医学领域的 SRL 视为一种领域自适应问题,以利用来自新闻领域的现有 SRL 资源。

结果

我们评估了三种最近提出的用于 SRL 的领域自适应算法的性能。我们的结果表明,通过使用领域自适应,可以显著降低开发生物医学领域 SRL 系统的成本。通过使用领域自适应,我们的系统仅使用 60 个标注的目标域摘要就可以达到 97%的性能。

可用性

我们的 BioKIT 系统在生物医学领域执行 SRL,如本文所述,它是用 Python 和 C 实现的,并在 Linux 操作系统下运行。BioKIT 可以从以下网址下载:http://nlp.comp.nus.edu.sg/software。领域自适应软件可从以下网址下载:http://www.mysmu.edu/faculty/jingjiang/software/DALR.html。BioProp 语料库可从语言数据联盟获取,网址为:http://www.ldc.upenn.edu。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验