Mihăilă Claudiu, Ananiadou Sophia
Biomed Eng Online. 2014;13 Suppl 2(Suppl 2):S1. doi: 10.1186/1475-925X-13-S2-S1. Epub 2014 Dec 11.
The increasing number of daily published articles in the biomedical domain has become too large for humans to handle on their own. As a result, bio-text mining technologies have been developed to improve their workload by automatically analysing the text and extracting important knowledge. Specific bio-entities, bio-events between these and facts can now be recognised with sufficient accuracy and are widely used by biomedical researchers. However, understanding how the extracted facts are connected in text is an extremely difficult task, which cannot be easily tackled by machinery.
In this article, we describe our method to recognise causal triggers and their arguments in biomedical scientific discourse. We introduce new features and show that a self-learning approach improves the performance obtained by supervised machine learners to 83.47% for causal triggers. Furthermore, the spans of causal arguments can be recognised to a slightly higher level that by using supervised or rule-based methods that have been employed before.
Exploiting the large amount of unlabelled data that is already available can help improve the performance of recognising causal discourse relations in the biomedical domain. This improvement will further benefit the development of multiple tasks, such as hypothesis generation for experimental laboratories, contradiction detection, and the creation of causal networks.
生物医学领域每日发表的文章数量不断增加,已多得让人类自身难以处理。因此,已开发出生物文本挖掘技术,通过自动分析文本和提取重要知识来减轻工作量。现在,特定的生物实体、它们之间的生物事件以及事实能够以足够的准确性被识别出来,并被生物医学研究人员广泛使用。然而,理解提取的事实在文本中是如何关联的是一项极其困难的任务,机器难以轻易解决。
在本文中,我们描述了在生物医学科学论述中识别因果触发因素及其论据的方法。我们引入了新的特征,并表明一种自学习方法将监督式机器学习器在因果触发因素方面的性能提高到了83.47%。此外,与之前使用的监督式或基于规则的方法相比,因果论据的跨度能够被识别到略高的水平。
利用现有的大量未标记数据有助于提高生物医学领域中因果论述关系识别的性能。这一改进将进一步有益于多个任务的发展,如实验实验室的假设生成、矛盾检测以及因果网络的创建。