Suppr超能文献

利用 GATE 的全生命周期开源文本分析技术,从生物医学文档中获取更多信息。

Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

机构信息

Department of Computer Science, University of Sheffield, Sheffield, UK.

出版信息

PLoS Comput Biol. 2013;9(2):e1002854. doi: 10.1371/journal.pcbi.1002854. Epub 2013 Feb 7.

Abstract

This software article describes the GATE family of open source text analysis tools and processes. GATE is one of the most widely used systems of its type with yearly download rates of tens of thousands and many active users in both academic and industrial contexts. In this paper we report three examples of GATE-based systems operating in the life sciences and in medicine. First, in genome-wide association studies which have contributed to discovery of a head and neck cancer mutation association. Second, medical records analysis which has significantly increased the statistical power of treatment/outcome models in the UK's largest psychiatric patient cohort. Third, richer constructs in drug-related searching. We also explore the ways in which the GATE family supports the various stages of the lifecycle present in our examples. We conclude that the deployment of text mining for document abstraction or rich search and navigation is best thought of as a process, and that with the right computational tools and data collection strategies this process can be made defined and repeatable. The GATE research programme is now 20 years old and has grown from its roots as a specialist development tool for text processing to become a rather comprehensive ecosystem, bringing together software developers, language engineers and research staff from diverse fields. GATE now has a strong claim to cover a uniquely wide range of the lifecycle of text analysis systems. It forms a focal point for the integration and reuse of advances that have been made by many people (the majority outside of the authors' own group) who work in text processing for biomedicine and other areas. GATE is available online <1> under GNU open source licences and runs on all major operating systems. Support is available from an active user and developer community and also on a commercial basis.

摘要

本文档介绍了 GATE 系列开源文本分析工具和流程。GATE 是同类系统中使用最广泛的系统之一,每年下载量达数万次,在学术和工业领域都有许多活跃用户。在本文中,我们报告了三个基于 GATE 的系统在生命科学和医学中的应用示例。首先,在全基因组关联研究中,发现了一个与头颈部癌症突变相关的关联。其次,在英国最大的精神病人队列中,通过对医疗记录进行分析,极大地提高了治疗/结果模型的统计能力。第三,在药物相关搜索中构建更丰富的结构。我们还探讨了 GATE 系列在我们的示例中支持各种生命周期阶段的方式。我们得出结论,部署文本挖掘进行文档摘要或丰富搜索和导航最好被视为一个过程,并且通过正确的计算工具和数据收集策略,可以使该过程具有定义性和可重复性。GATE 研究计划已经有 20 年的历史了,它已经从最初的专门用于文本处理的开发工具发展成为一个相当全面的生态系统,汇集了来自不同领域的软件开发人员、语言工程师和研究人员。GATE 现在有很强的理由涵盖文本分析系统生命周期的独特广泛范围。它是许多人(大多数不在作者自己的团队之外)在生物医学和其他领域从事文本处理工作所取得的进展的集成和重用的焦点。GATE 可在线获得<1>,遵循 GNU 开源许可证,可在所有主要操作系统上运行。支持可通过活跃的用户和开发人员社区获得,也可通过商业方式获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bae1/3567135/51fa706f9129/pcbi.1002854.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验