Suppr超能文献

使用条件随机场识别文本中的基因和蛋白质提及。

Identifying gene and protein mentions in text using conditional random fields.

作者信息

McDonald Ryan, Pereira Fernando

机构信息

Department of Computer and Information Science, University of Pennsylvania, Levine Hall, 3330 Walnut Street, Philadelphia, Pennsylvania 19104, USA.

出版信息

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-6-S1-S6. Epub 2005 May 24.

Abstract

BACKGROUND

We present a model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). Conditional random fields model the probability P(t/o) of a tag sequence given an observation sequence directly, and have previously been employed successfully for other tagging tasks. The mechanics of CRFs and their relationship to maximum entropy are discussed in detail.

RESULTS

We employ a diverse feature set containing standard orthographic features combined with expert features in the form of gene and biological term lexicons to achieve a precision of 86.4% and recall of 78.7%. An analysis of the contribution of the various features of the model is provided.

摘要

背景

我们提出了一种使用条件随机场(CRFs)的概率序列标记框架从文本中标记基因和蛋白质提及的模型。条件随机场直接对给定观察序列的标签序列概率P(t/o)进行建模,并且先前已成功应用于其他标记任务。详细讨论了CRFs的机制及其与最大熵的关系。

结果

我们采用了一个多样化的特征集,其中包含标准拼写特征以及基因和生物学术语词典形式的专家特征,以实现86.4%的精确率和78.7%的召回率。还提供了对模型各种特征贡献的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a2/1869020/3dde0ce62436/1471-2105-6-S1-S6-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验