Suppr超能文献

从科学文献中识别研究假设和新知识。

Identification of research hypotheses and new knowledge from scientific literature.

机构信息

National Centre for Text Mining, University of Manchester, Manchester, UK.

出版信息

BMC Med Inform Decis Mak. 2018 Jun 25;18(1):46. doi: 10.1186/s12911-018-0639-1.

Abstract

BACKGROUND

Text mining (TM) methods have been used extensively to extract relations and events from the literature. In addition, TM techniques have been used to extract various types or dimensions of interpretative information, known as Meta-Knowledge (MK), from the context of relations and events, e.g. negation, speculation, certainty and knowledge type. However, most existing methods have focussed on the extraction of individual dimensions of MK, without investigating how they can be combined to obtain even richer contextual information. In this paper, we describe a novel, supervised method to extract new MK dimensions that encode Research Hypotheses (an author's intended knowledge gain) and New Knowledge (an author's findings). The method incorporates various features, including a combination of simple MK dimensions.

METHODS

We identify previously explored dimensions and then use a random forest to combine these with linguistic features into a classification model. To facilitate evaluation of the model, we have enriched two existing corpora annotated with relations and events, i.e., a subset of the GENIA-MK corpus and the EU-ADR corpus, by adding attributes to encode whether each relation or event corresponds to Research Hypothesis or New Knowledge. In the GENIA-MK corpus, these new attributes complement simpler MK dimensions that had previously been annotated.

RESULTS

We show that our approach is able to assign different types of MK dimensions to relations and events with a high degree of accuracy. Firstly, our method is able to improve upon the previously reported state of the art performance for an existing dimension, i.e., Knowledge Type. Secondly, we also demonstrate high F1-score in predicting the new dimensions of Research Hypothesis (GENIA: 0.914, EU-ADR 0.802) and New Knowledge (GENIA: 0.829, EU-ADR 0.836).

CONCLUSION

We have presented a novel approach for predicting New Knowledge and Research Hypothesis, which combines simple MK dimensions to achieve high F1-scores. The extraction of such information is valuable for a number of practical TM applications.

摘要

背景

文本挖掘 (TM) 方法已广泛用于从文献中提取关系和事件。此外,TM 技术还用于从关系和事件的上下文中提取各种类型或维度的解释性信息,称为元知识 (MK),例如否定、推测、确定性和知识类型。然而,大多数现有方法都侧重于提取单个 MK 维度,而没有研究如何将它们结合起来以获得更丰富的上下文信息。在本文中,我们描述了一种新颖的、受监督的方法,用于提取新的 MK 维度,这些维度编码研究假设(作者预期的知识增益)和新知识(作者的发现)。该方法结合了各种特征,包括简单 MK 维度的组合。

方法

我们确定了以前探索过的维度,然后使用随机森林将这些维度与语言特征结合到一个分类模型中。为了便于评估模型,我们丰富了两个已有的标注有关系和事件的语料库,即 GENIA-MK 语料库的一个子集和 EU-ADR 语料库,为每个关系或事件添加属性以编码它们是否对应于研究假设或新知识。在 GENIA-MK 语料库中,这些新属性补充了以前已标注的更简单的 MK 维度。

结果

我们表明,我们的方法能够以高精度为关系和事件分配不同类型的 MK 维度。首先,我们的方法能够改进已有维度(即知识类型)的先前报告的最先进性能。其次,我们还在预测研究假设(GENIA:0.914,EU-ADR:0.802)和新知识(GENIA:0.829,EU-ADR:0.836)的新维度方面取得了很高的 F1 分数。

结论

我们提出了一种新的方法来预测新知识和研究假设,该方法结合了简单的 MK 维度,以实现高 F1 分数。此类信息的提取对于许多实用的 TM 应用非常有价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bbdc/6019216/706add9eb052/12911_2018_639_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验