Suppr超能文献

探索边界:生物医学文本中的基因与蛋白质识别

Exploring the boundaries: gene and protein identification in biomedical text.

作者信息

Finkel Jenny, Dingare Shipra, Manning Christopher D, Nissim Malvina, Alex Beatrice, Grover Claire

机构信息

Department of Computer Science, Stanford University, Stanford, CA 94305-9040, USA.

出版信息

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-6-S1-S5. Epub 2005 May 24.

Abstract

BACKGROUND

Good automatic information extraction tools offer hope for automatic processing of the exploding biomedical literature, and successful named entity recognition is a key component for such tools.

METHODS

We present a maximum-entropy based system incorporating a diverse set of features for identifying gene and protein names in biomedical abstracts.

RESULTS

This system was entered in the BioCreative comparative evaluation and achieved a precision of 0.83 and recall of 0.84 in the "open" evaluation and a precision of 0.78 and recall of 0.85 in the "closed" evaluation.

CONCLUSION

Central contributions are rich use of features derived from the training data at multiple levels of granularity, a focus on correctly identifying entity boundaries, and the innovative use of several external knowledge sources including full MEDLINE abstracts and web searches.

摘要

背景

优秀的自动信息提取工具为处理数量激增的生物医学文献的自动化过程带来了希望,而成功的命名实体识别是此类工具的关键组成部分。

方法

我们提出了一个基于最大熵的系统,该系统结合了多种不同的特征,用于识别生物医学摘要中的基因和蛋白质名称。

结果

该系统参加了生物创造性比较评估,在“开放”评估中精确率达到0.83,召回率达到0.84;在“封闭”评估中精确率为0.78,召回率为0.85。

结论

主要贡献在于在多个粒度级别丰富使用从训练数据派生的特征,专注于正确识别实体边界,以及创新性地使用包括完整MEDLINE摘要和网络搜索在内的多种外部知识源。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3295/1869019/1855295e5db9/1471-2105-6-S1-S5-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验