Department of Computing and Information Systems, University of Melbourne, VIC 3010, Australia.
BMC Med Inform Decis Mak. 2012 Apr 30;12 Suppl 1(Suppl 1):S4. doi: 10.1186/1472-6947-12-S1-S4.
This work describes a system for identifying event mentions in bio-molecular research abstracts that are either speculative (e.g. analysis of IkappaBalpha phosphorylation, where it is not specified whether phosphorylation did or did not occur) or negated (e.g. inhibition of IkappaBalpha phosphorylation, where phosphorylation did not occur). The data comes from a standard dataset created for the BioNLP 2009 Shared Task. The system uses a machine-learning approach, where the features used for classification are a combination of shallow features derived from the words of the sentences and more complex features based on the semantic outputs produced by a deep parser.
To detect event modification, we use a Maximum Entropy learner with features extracted from the data relative to the trigger words of the events. The shallow features are bag-of-words features based on a small sliding context window of 3-4 tokens on either side of the trigger word. The deep parser features are derived from parses produced by the English Resource Grammar and the RASP parser. The outputs of these parsers are converted into the Minimal Recursion Semantics formalism, and from this, we extract features motivated by linguistics and the data itself. All of these features are combined to create training or test data for the machine learning algorithm.
Over the test data, our methods produce approximately a 4% absolute increase in F-score for detection of event modification compared to a baseline based only on the shallow bag-of-words features.
Our results indicate that grammar-based techniques can enhance the accuracy of methods for detecting event modification.
这项工作描述了一个系统,用于识别生物分子研究摘要中的事件提及,这些提及要么是推测性的(例如分析 IkappaBalpha 的磷酸化,其中没有指定磷酸化是否发生),要么是否定性的(例如抑制 IkappaBalpha 的磷酸化,其中磷酸化没有发生)。该数据来自为 BioNLP 2009 共享任务创建的标准数据集。该系统使用机器学习方法,其中用于分类的特征是句子单词的浅层特征与基于深度解析器生成的语义输出的更复杂特征的组合。
为了检测事件修饰,我们使用最大熵学习者,其特征是从与事件触发词相关的数据中提取的。浅层特征是基于触发词两侧 3-4 个标记的词袋特征。深度解析器特征来自英语资源语法和 RASP 解析器生成的解析。这些解析器的输出被转换为最小递归语义形式,从中我们提取了受语言学和数据本身启发的特征。所有这些特征都组合在一起,为机器学习算法创建训练或测试数据。
在测试数据上,与仅基于浅层词袋特征的基线相比,我们的方法在检测事件修饰方面的 F 分数提高了约 4%。
我们的结果表明,基于语法的技术可以提高检测事件修饰的方法的准确性。