Ravikumar K E, Rastegar-Mojarad Majid, Liu Hongfang
Department of Health Sciences Research, Mayo Clinic, USA and.
Department of Health Informatics and Administration, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
Database (Oxford). 2017 Jan 1;2017(1). doi: 10.1093/database/baw156.
Extracting meaningful relationships with semantic significance from biomedical literature is often a challenging task. BioCreative V track4 challenge for the first time has organized a comprehensive shared task to test the robustness of the text-mining algorithms in extracting semantically meaningful assertions from the evidence statement in biomedical text. In this work, we tested the ability of a rule-based semantic parser to extract Biological Expression Language (BEL) statements from evidence sentences culled out of biomedical literature as part of BioCreative V Track4 challenge. The system achieved an overall best F-measure of 21.29% in extracting the complete BEL statement. For relation extraction, the system achieved an F-measure of 65.13% on test data set. Our system achieved the best performance in five of the six criteria that was adopted for evaluation by the task organizers. Lack of ability to derive semantic inferences, limitation in the rule sets to map the textual extractions to BEL function were some of the reasons for low performance in extracting the complete BEL statement. Post shared task we also evaluated the impact of differential NER components on the ability to extract BEL statements on the test data sets besides making a single change in the rule sets that translate relation extractions into a BEL statement. There is a marked improvement by over 20% in the overall performance of the BELMiner's capability to extract BEL statement on the test set. The system is available as a REST-API at http://54.146.11.205:8484/BELXtractor/finder/.
从生物医学文献中提取具有语义意义的有意义关系往往是一项具有挑战性的任务。生物创意V挑战赛的第4赛道首次组织了一项全面的共享任务,以测试文本挖掘算法从生物医学文本中的证据陈述中提取语义上有意义的断言的稳健性。在这项工作中,我们测试了一个基于规则的语义解析器从生物医学文献中挑选出的证据句子中提取生物表达语言(BEL)陈述的能力,这是生物创意V挑战赛第4赛道挑战的一部分。该系统在提取完整的BEL陈述方面总体最佳F值为21.29%。对于关系提取,该系统在测试数据集上的F值为65.13%。我们的系统在任务组织者采用的六个评估标准中的五个方面取得了最佳性能。缺乏推导语义推理的能力、将文本提取映射到BEL函数的规则集的局限性是提取完整BEL陈述时性能较低的一些原因。在共享任务之后,我们还评估了不同命名实体识别组件对在测试数据集上提取BEL陈述能力的影响,此外还对将关系提取转换为BEL陈述的规则集进行了单一更改。BELMiner在测试集上提取BEL陈述的能力的整体性能有超过20%的显著提高。该系统可通过REST-API在http://54.146.11.205:8484/BELXtractor/finder/获取。