使用条件随机场识别文本中的基因和蛋白质提及。

Identifying gene and protein mentions in text using conditional random fields.

作者信息

McDonald Ryan, Pereira Fernando

机构信息

Department of Computer and Information Science, University of Pennsylvania, Levine Hall, 3330 Walnut Street, Philadelphia, Pennsylvania 19104, USA.

出版信息

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-6-S1-S6. Epub 2005 May 24.

DOI:10.1186/1471-2105-6-S1-S6

PMID:15960840

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1869020/

Abstract

BACKGROUND

We present a model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). Conditional random fields model the probability P(t/o) of a tag sequence given an observation sequence directly, and have previously been employed successfully for other tagging tasks. The mechanics of CRFs and their relationship to maximum entropy are discussed in detail.

RESULTS

We employ a diverse feature set containing standard orthographic features combined with expert features in the form of gene and biological term lexicons to achieve a precision of 86.4% and recall of 78.7%. An analysis of the contribution of the various features of the model is provided.

摘要

背景

我们提出了一种使用条件随机场（CRFs）的概率序列标记框架从文本中标记基因和蛋白质提及的模型。条件随机场直接对给定观察序列的标签序列概率P(t/o)进行建模，并且先前已成功应用于其他标记任务。详细讨论了CRFs的机制及其与最大熵的关系。

结果

我们采用了一个多样化的特征集，其中包含标准拼写特征以及基因和生物学术语词典形式的专家特征，以实现86.4%的精确率和78.7%的召回率。还提供了对模型各种特征贡献的分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c5a2/1869020/3dde0ce62436/1471-2105-6-S1-S6-1.jpg

相似文献

Identifying gene and protein mentions in text using conditional random fields.使用条件随机场识别文本中的基因和蛋白质提及。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-6-S1-S6. Epub 2005 May 24.

Automated recognition of malignancy mentions in biomedical literature.生物医学文献中恶性肿瘤提及的自动识别。

BMC Bioinformatics. 2006 Nov 7;7:492. doi: 10.1186/1471-2105-7-492.

Automatic extraction of gene/protein biological functions from biomedical text.从生物医学文本中自动提取基因/蛋白质的生物学功能。

Bioinformatics. 2005 Apr 1;21(7):1227-36. doi: 10.1093/bioinformatics/bti084. Epub 2004 Oct 27.

Rich features based Conditional Random Fields for biological named entities recognition.基于丰富特征的条件随机场在生物命名实体识别中的应用

Comput Biol Med. 2007 Sep;37(9):1327-33. doi: 10.1016/j.compbiomed.2006.12.002. Epub 2007 Jan 19.

Two-phase biomedical named entity recognition using CRFs.使用条件随机场的两阶段生物医学命名实体识别

Comput Biol Chem. 2009 Aug;33(4):334-8. doi: 10.1016/j.compbiolchem.2009.07.004. Epub 2009 Aug 4.

Protein annotation as term categorization in the gene ontology using word proximity networks.利用词邻近网络在基因本体论中将蛋白质注释作为术语分类。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S20. doi: 10.1186/1471-2105-6-S1-S20. Epub 2005 May 24.

CoCRF deformable model: a geometric model driven by collaborative conditional random fields.协同条件随机场可变形模型：一种由协同条件随机场驱动的几何模型。

IEEE Trans Image Process. 2009 Oct;18(10):2316-29. doi: 10.1109/TIP.2009.2026624. Epub 2009 Jun 30.

Conditional random fields as classifiers for three-class motor-imagery brain-computer interfaces.条件随机场作为三分类运动想象脑-机接口的分类器。

J Neural Eng. 2011 Apr;8(2):025013. doi: 10.1088/1741-2560/8/2/025013. Epub 2011 Mar 24.

Learning flexible features for conditional random fields.为条件随机场学习灵活特征。

IEEE Trans Pattern Anal Mach Intell. 2008 Aug;30(8):1415-26. doi: 10.1109/TPAMI.2007.70790.

Text mining and protein annotations: the construction and use of protein description sentences.文本挖掘与蛋白质注释：蛋白质描述语句的构建与应用

Genome Inform. 2006;17(2):121-30.

引用本文的文献

Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?生物多样性研究中的数据集搜索：数据存储库中的元数据是否反映了学术信息需求？

PLoS One. 2021 Mar 24;16(3):e0246099. doi: 10.1371/journal.pone.0246099. eCollection 2021.

Data Mining Approach for Extraction of Useful Information About Biologically Active Compounds from Publications.从文献中提取生物活性化合物有用信息的数据挖掘方法。

J Chem Inf Model. 2019 Sep 23;59(9):3635-3644. doi: 10.1021/acs.jcim.9b00164. Epub 2019 Sep 10.

Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition.诱导前条件随机场：通过诱导连接独立实体以提高临床命名实体识别。

BMC Med Inform Decis Mak. 2019 Jul 15;19(1):132. doi: 10.1186/s12911-019-0865-1.

Ensemble method-based extraction of medication and related information from clinical texts.基于集成方法的临床文本中药物及相关信息的提取。

J Am Med Inform Assoc. 2020 Jan 1;27(1):31-38. doi: 10.1093/jamia/ocz100.

DrugMetab: An Integrated Machine Learning and Lexicon Mapping Named Entity Recognition Method for Drug Metabolite.药物代谢：一种集成机器学习和词典映射的药物代谢物命名实体识别方法。

CPT Pharmacometrics Syst Pharmacol. 2018 Nov;7(11):709-717. doi: 10.1002/psp4.12340. Epub 2018 Sep 29.

Automated Neuroanatomical Relation Extraction: A Linguistically Motivated Approach with a PVT Connectivity Graph Case Study.自动神经解剖关系提取：一种基于语言学动机的方法及PVT连接图案例研究

Front Neuroinform. 2016 Sep 21;10:39. doi: 10.3389/fninf.2016.00039. eCollection 2016.

A Study of Concept Extraction Across Different Types of Clinical Notes.不同类型临床记录中的概念提取研究。

AMIA Annu Symp Proc. 2015 Nov 5;2015:737-46. eCollection 2015.

Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network.基于文献挖掘和本体的宿主-布鲁氏菌基因-基因相互作用网络分析

Front Microbiol. 2015 Dec 9;6:1386. doi: 10.3389/fmicb.2015.01386. eCollection 2015.

A review on computational systems biology of pathogen-host interactions.病原体-宿主相互作用的计算系统生物学综述。

Front Microbiol. 2015 Apr 9;6:235. doi: 10.3389/fmicb.2015.00235. eCollection 2015.

Identifying named entities from PubMed for enriching semantic categories.从PubMed中识别命名实体以丰富语义类别。

BMC Bioinformatics. 2015 Feb 21;16:57. doi: 10.1186/s12859-015-0487-2.

本文引用的文献

A critical assessment of text mining methods in molecular biology. Proceedings of a workshop. March 28-31, 2004. Granada, Spain.分子生物学中文本挖掘方法的批判性评估。研讨会论文集。2004年3月28日至31日。西班牙格拉纳达。

BMC Bioinformatics. 2005;6 Suppl 1:S1-23.

A biological named entity recognizer.一个生物命名实体识别器。

Pac Symp Biocomput. 2003:427-38. doi: 10.1142/9789812776303_0040.

Tagging gene and protein names in biomedical text.在生物医学文本中标记基因和蛋白质名称。

Bioinformatics. 2002 Aug;18(8):1124-32. doi: 10.1093/bioinformatics/18.8.1124.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用条件随机场识别文本中的基因和蛋白质提及。

Identifying gene and protein mentions in text using conditional random fields.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

背景

结果

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献