Suppr超能文献

基因/蛋白质注释的协调:迈向 MEDLINE 的黄金标准。

Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

机构信息

University of Aveiro, IEETA/DETI, Campus Universitário de Santiago, Aveiro, Portugal.

出版信息

Bioinformatics. 2012 May 1;28(9):1253-61. doi: 10.1093/bioinformatics/bts125. Epub 2012 Mar 13.

Abstract

MOTIVATION

The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs).

RESULTS

We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts.

AVAILABILITY AND IMPLEMENTATION

Totum is implemented in Java and its resources are available at http://bioinformatics.ua.pt/totum

摘要

动机

命名实体识别(NER)是生物医学文本挖掘中的基本任务。近年来,利用可用的带注释语料库、术语资源和机器学习技术,已经提出了许多 NER 解决方案。目前,性能最好的解决方案是结合针对单个语料库的选定注释解决方案的输出。然而,很少有人致力于系统地分析协调注释结果并针对组合的黄金标准语料库(GSCs)进行测量的方法。

结果

我们提出了 Totum,这是一种机器学习解决方案,可协调来自异构 NER 解决方案的基因/蛋白质注释。它已经针对人工编辑的 GSCs 进行了优化和测量。所进行的实验表明,我们的方法可以将最先进解决方案的 F 度量提高多达 10%(达到≈70%)的精确对齐,22%(达到≈82%)的嵌套对齐。我们证明了我们的解决方案可以在 GSCs 之间提供可靠的注释结果,这是对 MEDLINE 摘要进行统一注释的重要贡献。

可用性和实现

Totum 是用 Java 实现的,其资源可在 http://bioinformatics.ua.pt/totum 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验