Tsai Richard Tzong-Han, Dai Hong-Jie, Huang Chi-Hsin, Hsu Wen-Lian
Department of Computer Science & Engineering, Yuan Ze University, Chung-Li, Taiwan, R.O.C.
BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S18. doi: 10.1186/1471-2105-9-S12-S18.
BACKGROUND: Semantic role labeling (SRL) is an important text analysis technique. In SRL, sentences are represented by one or more predicate-argument structures (PAS). Each PAS is composed of a predicate (verb) and several arguments (noun phrases, adverbial phrases, etc.) with different semantic roles, including main arguments (agent or patient) as well as adjunct arguments (time, manner, or location). PropBank is the most widely used PAS corpus and annotation format in the newswire domain. In the biomedical field, however, more detailed and restrictive PAS annotation formats such as PASBio are popular. Unfortunately, due to the lack of an annotated PASBio corpus, no publicly available machine-learning (ML) based SRL systems based on PASBio have been developed. In previous work, we constructed a biomedical corpus based on the PropBank standard called BioProp, on which we developed an ML-based SRL system, BIOSMILE. In this paper, we aim to build a system to convert BIOSMILE's BioProp annotation output to PASBio annotation. Our system consists of BIOSMILE in combination with a BioProp-PASBio rule-based converter, and an additional semi-automatic rule generator. RESULTS: Our first experiment evaluated our rule-based converter's performance independently from BIOSMILE performance. The converter achieved an F-score of 85.29%. The second experiment evaluated combined system (BIOSMILE + rule-based converter). The system achieved an F-score of 69.08% for PASBio's 29 verbs. CONCLUSION: Our approach allows PAS conversion between BioProp and PASBio annotation using BIOSMILE alongside our newly developed semi-automatic rule generator and rule-based converter. Our system can match the performance of other state-of-the-art domain-specific ML-based SRL systems and can be easily customized for PASBio application development.
背景:语义角色标注(SRL)是一种重要的文本分析技术。在SRL中,句子由一个或多个谓词-论元结构(PAS)表示。每个PAS由一个谓词(动词)和几个具有不同语义角色的论元(名词短语、状语短语等)组成,包括主要论元(施事或受事)以及附属论元(时间、方式或地点)。PropBank是新闻领域中使用最广泛的PAS语料库和标注格式。然而,在生物医学领域,更详细和严格的PAS标注格式(如PASBio)很受欢迎。不幸的是,由于缺乏带注释的PASBio语料库,尚未开发出基于PASBio的公开可用的基于机器学习(ML)的SRL系统。在之前的工作中,我们基于PropBank标准构建了一个名为BioProp的生物医学语料库,并在此基础上开发了一个基于ML的SRL系统BIOSMILE。在本文中,我们旨在构建一个系统,将BIOSMILE的BioProp标注输出转换为PASBio标注。我们的系统由BIOSMILE与一个基于BioProp-PASBio规则的转换器以及一个额外的半自动规则生成器组成。 结果:我们的第一个实验独立于BIOSMILE的性能评估了基于规则的转换器的性能。该转换器的F值为85.29%。第二个实验评估了组合系统(BIOSMILE + 基于规则的转换器)。对于PASBio的29个动词,该系统的F值为69.08%。 结论:我们的方法允许使用BIOSMILE以及新开发的半自动规则生成器和基于规则的转换器在BioProp和PASBio标注之间进行PAS转换。我们的系统可以与其他基于ML的最新领域特定SRL系统的性能相匹配,并且可以轻松定制以用于PASBio应用开发。
BMC Bioinformatics. 2008-12-12
BMC Bioinformatics. 2006-11-24
J Am Med Inform Assoc. 2015-9
Bioinformatics. 2010-2-23
BMC Bioinformatics. 2004-10-19
BMC Bioinformatics. 2014-5-27
BMC Bioinformatics. 2008-6-11
IEEE/ACM Trans Comput Biol Bioinform. 2012
AMIA Annu Symp Proc. 2005
Database (Oxford). 2016-5-12
BMC Bioinformatics. 2008-12-12
Nucleic Acids Res. 2008-7-1
BMC Bioinformatics. 2006-11-24
AMIA Annu Symp Proc. 2005
Bioinformatics. 2006-4-1
PLoS Comput Biol. 2005-6
BMC Bioinformatics. 2004-10-19