National Centre for Text Mining, School of Computer Science.
Manchester Institute of Biotechnology, The University of Manchester, Manchester, UK.
Bioinformatics. 2017 Dec 1;33(23):3784-3792. doi: 10.1093/bioinformatics/btx466.
In recent years, there has been great progress in the field of automated curation of biomedical networks and models, aided by text mining methods that provide evidence from literature. Such methods must not only extract snippets of text that relate to model interactions, but also be able to contextualize the evidence and provide additional confidence scores for the interaction in question. Although various approaches calculating confidence scores have focused primarily on the quality of the extracted information, there has been little work on exploring the textual uncertainty conveyed by the author. Despite textual uncertainty being acknowledged in biomedical text mining as an attribute of text mined interactions (events), it is significantly understudied as a means of providing a confidence measure for interactions in pathways or other biomedical models. In this work, we focus on improving identification of textual uncertainty for events and explore how it can be used as an additional measure of confidence for biomedical models.
We present a novel method for extracting uncertainty from the literature using a hybrid approach that combines rule induction and machine learning. Variations of this hybrid approach are then discussed, alongside their advantages and disadvantages. We use subjective logic theory to combine multiple uncertainty values extracted from different sources for the same interaction. Our approach achieves F-scores of 0.76 and 0.88 based on the BioNLP-ST and Genia-MK corpora, respectively, making considerable improvements over previously published work. Moreover, we evaluate our proposed system on pathways related to two different areas, namely leukemia and melanoma cancer research.
The leukemia pathway model used is available in Pathway Studio while the Ras model is available via PathwayCommons. Online demonstration of the uncertainty extraction system is available for research purposes at http://argo.nactem.ac.uk/test. The related code is available on https://github.com/c-zrv/uncertainty_components.git. Details on the above are available in the Supplementary Material.
sophia.ananiadou@manchester.ac.uk.
Supplementary data are available at Bioinformatics online.
近年来,借助从文献中提供证据的文本挖掘方法,生物医学网络和模型的自动化编纂领域取得了巨大进展。此类方法不仅必须提取与模型交互相关的文本片段,还必须能够对证据进行上下文分析,并为所讨论的交互提供额外的置信度得分。尽管各种计算置信度得分的方法主要侧重于提取信息的质量,但很少有研究探索作者传达的文本不确定性。尽管在生物医学文本挖掘中,文本不确定性被认为是挖掘交互(事件)的文本的一个属性,但它作为生物医学模型中交互的置信度度量的手段,仍未得到充分研究。在这项工作中,我们专注于改进对事件的文本不确定性的识别,并探索如何将其用作生物医学模型置信度的附加度量。
我们提出了一种使用混合方法从文献中提取不确定性的新方法,该方法结合了规则归纳和机器学习。然后讨论了这种混合方法的变体,以及它们的优缺点。我们使用主观逻辑理论来组合同一交互的不同来源提取的多个不确定性值。我们的方法在基于 BioNLP-ST 和 Genia-MK 语料库的实验中分别实现了 0.76 和 0.88 的 F 分数,相较于之前的工作有了显著的提升。此外,我们还在与白血病和黑色素瘤癌症研究两个不同领域相关的途径上评估了我们提出的系统。
使用的白血病途径模型可在 Pathway Studio 中获得,而 Ras 模型可通过 PathwayCommons 获得。不确定性提取系统的在线演示可用于研究目的,网址为 http://argo.nactem.ac.uk/test。相关代码可在 https://github.com/c-zrv/uncertainty_components.git 上获得。有关详细信息,请参见补充材料。
sophia.ananiadou@manchester.ac.uk。
补充数据可在《生物信息学》在线获取。