Lu Yang, Ma Xiaolei, Lu Yinan, Zhou Yuxin, Pei Zhili
College of Computer Science and Technology, Jilin University, Changchun, Jilin 130000, China; Library, Inner Mongolia University for Nationalities, Tongliao, Inner Mongolia 028000, China.
College of Computer Science and Technology, Jilin University, Changchun, Jilin 130000, China.
Comput Math Methods Med. 2016;2016:7536494. doi: 10.1155/2016/7536494. Epub 2016 Dec 14.
Biomedical event extraction is an important and difficult task in bioinformatics. With the rapid growth of biomedical literature, the extraction of complex events from unstructured text has attracted more attention. However, the annotated biomedical corpus is highly imbalanced, which affects the performance of the classification algorithms. In this study, a sample selection algorithm based on sequential pattern is proposed to filter negative samples in the training phase. Considering the joint information between the trigger and argument of multiargument events, we extract triplets of multiargument events directly using a support vector machine classifier. A joint scoring mechanism, which is based on sentence similarity and importance of trigger in the training data, is used to correct the predicted results. Experimental results indicate that the proposed method can extract events efficiently.
生物医学事件提取是生物信息学中一项重要且困难的任务。随着生物医学文献的快速增长,从非结构化文本中提取复杂事件受到了更多关注。然而,带注释的生物医学语料库高度不均衡,这影响了分类算法的性能。在本研究中,提出了一种基于序列模式的样本选择算法,用于在训练阶段过滤负样本。考虑到多论点事件的触发词和论据之间的联合信息,我们直接使用支持向量机分类器提取多论点事件的三元组。一种基于句子相似度和训练数据中触发词重要性的联合评分机制用于校正预测结果。实验结果表明,所提出的方法能够高效地提取事件。