Suppr超能文献

用于改进从生物医学文献中提取基于模式的信息的简单技巧。

Simple tricks for improving pattern-based information extraction from the biomedical literature.

作者信息

Nguyen Quang Long, Tikk Domonkos, Leser Ulf

机构信息

Knowledge Management in Bioinformatics, Department for Computer Science, Humboldt-Universität zu Berlin, Germany.

出版信息

J Biomed Semantics. 2010 Sep 24;1(1):9. doi: 10.1186/2041-1480-1-9.

Abstract

BACKGROUND

Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns.

RESULTS

We propose several techniques for filtering sets of automatically generated patterns and analyze their effectiveness for different extraction tasks, as defined in the recent BioNLP 2009 shared task. We focus on simple methods that only take into account the complexity of the pattern and the complexity of the texts the patterns are applied to. We show that our techniques, despite their simplicity, yield large improvements in all tasks we analyzed. For instance, they raise the F-score for the task of extraction gene expression events from 24.8% to 51.9%.

CONCLUSIONS

Already very simple filtering techniques may improve the F-score of an information extraction method based on automatically generated patterns significantly. Furthermore, the application of such methods yields a considerable speed-up, as fewer matches need to be analysed. Due to their simplicity, the proposed filtering techniques also should be applicable to other methods using linguistic patterns for information extraction.

摘要

背景

基于模式的关系提取方法在生物医学文本挖掘的许多领域都取得了很好的成果。然而,定义合适的模式集很困难;方法要么是手动的,成本高昂,要么是自动的,往往会产生大量噪声模式。

结果

我们提出了几种用于过滤自动生成的模式集的技术,并分析了它们在最近的2009年生物自然语言处理共享任务中定义的不同提取任务中的有效性。我们专注于仅考虑模式的复杂性以及应用模式的文本的复杂性的简单方法。我们表明,我们的技术尽管简单,但在我们分析的所有任务中都带来了很大的改进。例如,它们将从文本中提取基因表达事件任务的F值从24.8%提高到了51.9%。

结论

非常简单的过滤技术就可能显著提高基于自动生成模式的信息提取方法的F值。此外,应用这些方法可以显著加快速度,因为需要分析的匹配项更少。由于其简单性,所提出的过滤技术也应该适用于其他使用语言模式进行信息提取的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dffd/2955645/31da40d40f4e/2041-1480-1-9-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验