Suppr超能文献

基于术语驱动的生物医学文献挖掘

Terminology-driven mining of biomedical literature.

作者信息

Nenadic Goran, Spasic Irena, Ananiadou Sophia

机构信息

Computer Science, University of Salford, Salford M5 4WT, UK.

出版信息

Bioinformatics. 2003 May 22;19(8):938-43. doi: 10.1093/bioinformatics/btg105.

Abstract

MOTIVATION

With an overwhelming amount of textual information in molecular biology and biomedicine, there is a need for effective literature mining techniques that can help biologists to gather and make use of the knowledge encoded in text documents. Although the knowledge is organized around sets of domain-specific terms, few literature mining systems incorporate deep and dynamic terminology processing.

RESULTS

In this paper, we present an overview of an integrated framework for terminology-driven mining from biomedical literature. The framework integrates the following components: automatic term recognition, term variation handling, acronym acquisition, automatic discovery of term similarities and term clustering. The term variant recognition is incorporated into terminology recognition process by taking into account orthographical, morphological, syntactic, lexico-semantic and pragmatic term variations. In particular, we address acronyms as a common way of introducing term variants in biomedical papers. Term clustering is based on the automatic discovery of term similarities. We use a hybrid similarity measure, where terms are compared by using both internal and external evidence. The measure combines lexical, syntactical and contextual similarity. Experiments on terminology recognition and clustering performed on a corpus of MEDLINE abstracts recorded the precision of 98 and 71% respectively.

AVAILABILITY

software for the terminology management is available upon request.

摘要

动机

在分子生物学和生物医学领域,存在大量的文本信息,因此需要有效的文献挖掘技术来帮助生物学家收集和利用文本文件中编码的知识。尽管这些知识是围绕特定领域的术语集组织的,但很少有文献挖掘系统纳入深入和动态的术语处理。

结果

在本文中,我们概述了一个用于从生物医学文献中进行术语驱动挖掘的集成框架。该框架集成了以下组件:自动术语识别、术语变体处理、首字母缩略词获取、术语相似性自动发现和术语聚类。通过考虑正字法、形态学、句法、词汇语义和语用术语变体,将术语变体识别纳入术语识别过程。特别是,我们将首字母缩略词作为生物医学论文中引入术语变体的一种常见方式来处理。术语聚类基于术语相似性的自动发现。我们使用一种混合相似性度量,通过使用内部和外部证据来比较术语。该度量结合了词汇、句法和上下文相似性。在MEDLINE摘要语料库上进行的术语识别和聚类实验分别记录了98%和71%的精确率。

可用性

可根据要求提供术语管理软件。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验