IEEE/ACM Trans Comput Biol Bioinform. 2020 Nov-Dec;17(6):1895-1906. doi: 10.1109/TCBB.2019.2904231. Epub 2020 Dec 8.
We present an analysis of the problem of identifying biological context and associating it with biochemical events described in biomedical texts. This constitutes a non-trivial, inter-sentential relation extraction task. We focus on biological context as descriptions of the species, tissue type, and cell type that are associated with biochemical events. We present a new corpus of open access biomedical texts that have been annotated by biology subject matter experts to highlight context-event relations. Using this corpus, we evaluate several classifiers for context-event association along with a detailed analysis of the impact of a variety of linguistic features on classifier performance. We find that gradient tree boosting performs by far the best, achieving an F1 of 0.865 in a cross-validation study.
我们分析了在生物医学文本中识别生物背景并将其与生化事件相关联的问题。这是一项非平凡的、跨句子的关系抽取任务。我们关注的生物背景是与生化事件相关联的物种、组织类型和细胞类型的描述。我们提供了一个新的公开获取的生物医学文本语料库,该语料库已被生物学主题专家注释,以突出上下文-事件关系。使用这个语料库,我们评估了几种用于上下文-事件关联的分类器,以及对各种语言特征对分类器性能的影响进行了详细分析。我们发现梯度提升树的性能迄今为止最好,在交叉验证研究中达到了 0.865 的 F1 值。