School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu Province, China.
Artif Intell Med. 2011 Nov;53(3):205-13. doi: 10.1016/j.artmed.2011.08.002. Epub 2011 Sep 25.
Biomedical events extraction concerns about events describing changes on the state of bio-molecules from literature. Comparing to the protein-protein interactions (PPIs) extraction task which often only involves the extraction of binary relations between two proteins, biomedical events extraction is much harder since it needs to deal with complex events consisting of embedded or hierarchical relations among proteins, events, and their textual triggers. In this paper, we propose an information extraction system based on the hidden vector state (HVS) model, called HVS-BioEvent, for biomedical events extraction, and investigate its capability in extracting complex events.
HVS has been previously employed for extracting PPIs. In HVS-BioEvent, we propose an automated way to generate abstract annotations for HVS training and further propose novel machine learning approaches for event trigger words identification, and for biomedical events extraction from the HVS parse results.
Our proposed system achieves an F-score of 49.57% on the corpus used in the BioNLP'09 shared task, which is only 2.38% lower than the best performing system by UTurku in the BioNLP'09 shared task. Nevertheless, HVS-BioEvent outperforms UTurku's system on complex events extraction with 36.57% vs. 30.52% being achieved for extracting regulation events, and 40.61% vs. 38.99% for negative regulation events.
The results suggest that the HVS model with the hierarchical hidden state structure is indeed more suitable for complex event extraction since it could naturally model embedded structural context in sentences.
生物医学事件抽取关注的是从文献中描述生物分子状态变化的事件。与经常只涉及提取两个蛋白质之间二元关系的蛋白质-蛋白质相互作用(PPIs)提取任务相比,生物医学事件抽取要困难得多,因为它需要处理由蛋白质、事件及其文本触发之间的嵌入式或层次关系组成的复杂事件。在本文中,我们提出了一种基于隐藏向量状态(HVS)模型的信息抽取系统,称为 HVS-BioEvent,用于生物医学事件抽取,并研究了其提取复杂事件的能力。
HVS 先前已被用于提取 PPIs。在 HVS-BioEvent 中,我们提出了一种自动化的方法来生成 HVS 训练的抽象注释,并进一步提出了新的机器学习方法来识别事件触发词,并从 HVS 解析结果中提取生物医学事件。
我们提出的系统在 BioNLP'09 共享任务中使用的语料库上实现了 49.57%的 F 分数,仅比 BioNLP'09 共享任务中 UTurku 表现最好的系统低 2.38%。然而,HVS-BioEvent 在复杂事件抽取方面优于 UTurku 的系统,提取调控事件的准确率为 36.57%,而提取负调控事件的准确率为 40.61%。
结果表明,具有层次隐藏状态结构的 HVS 模型确实更适合复杂事件抽取,因为它可以自然地对句子中的嵌入式结构上下文进行建模。