NUS Graduate School for Integrative Sciences and Engineering, Singapore 117456, Singapore.
Bioinformatics. 2010 Apr 15;26(8):1098-104. doi: 10.1093/bioinformatics/btq075. Epub 2010 Feb 23.
MOTIVATION: Semantic role labeling (SRL) is a natural language processing (NLP) task that extracts a shallow meaning representation from free text sentences. Several efforts to create SRL systems for the biomedical domain have been made during the last few years. However, state-of-the-art SRL relies on manually annotated training instances, which are rare and expensive to prepare. In this article, we address SRL for the biomedical domain as a domain adaptation problem to leverage existing SRL resources from the newswire domain. RESULTS: We evaluate the performance of three recently proposed domain adaptation algorithms for SRL. Our results show that by using domain adaptation, the cost of developing an SRL system for the biomedical domain can be reduced significantly. Using domain adaptation, our system can achieve 97% of the performance with as little as 60 annotated target domain abstracts. AVAILABILITY: Our BioKIT system that performs SRL in the biomedical domain as described in this article is implemented in Python and C and operates under the Linux operating system. BioKIT can be downloaded at http://nlp.comp.nus.edu.sg/software. The domain adaptation software is available for download at http://www.mysmu.edu/faculty/jingjiang/software/DALR.html. The BioProp corpus is available from the Linguistic Data Consortium http://www.ldc.upenn.edu.
动机:语义角色标注(SRL)是一种自然语言处理(NLP)任务,它从自由文本句子中提取出浅层的语义表示。在过去的几年中,已经有几项针对生物医学领域的 SRL 系统的创建工作。然而,最新的 SRL 依赖于手动标注的训练实例,这些实例很少且准备起来很昂贵。在本文中,我们将生物医学领域的 SRL 视为一种领域自适应问题,以利用来自新闻领域的现有 SRL 资源。
结果:我们评估了三种最近提出的用于 SRL 的领域自适应算法的性能。我们的结果表明,通过使用领域自适应,可以显著降低开发生物医学领域 SRL 系统的成本。通过使用领域自适应,我们的系统仅使用 60 个标注的目标域摘要就可以达到 97%的性能。
可用性:我们的 BioKIT 系统在生物医学领域执行 SRL,如本文所述,它是用 Python 和 C 实现的,并在 Linux 操作系统下运行。BioKIT 可以从以下网址下载:http://nlp.comp.nus.edu.sg/software。领域自适应软件可从以下网址下载:http://www.mysmu.edu/faculty/jingjiang/software/DALR.html。BioProp 语料库可从语言数据联盟获取,网址为:http://www.ldc.upenn.edu。
Bioinformatics. 2010-2-23
J Am Med Inform Assoc. 2015-9
Bioinformatics. 2004-5-1
Bioinformatics. 2006-3-15
Int J Med Inform. 2006-6
Bioinformatics. 2007-4-15
BMC Bioinformatics. 2008-12-12
Int J Med Inform. 2006-6
AMIA Jt Summits Transl Sci Proc. 2018-5-18
AMIA Annu Symp Proc. 2018-4-16
AMIA Annu Symp Proc. 2017-2-10
Database (Oxford). 2016-5-12