• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

BCC-NER:用于基因/蛋白质提及识别的双向上下文线索命名实体标记器。

BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.

作者信息

Murugesan Gurusamy, Abdulkadhar Sabenabanu, Bhasuran Balu, Natarajan Jeyakumar

机构信息

Data Mining and Text Mining Lab, Department of Bioinformatics, Bharathiar University, Coimbatore, Tamilnadu, 641046, India.

Center for Computational Biology, DRDO-BU Center for Life Sciences, Bharathiar University, Coimbatore, Tamilnadu, 641046, India.

出版信息

EURASIP J Bioinform Syst Biol. 2017 Dec;2017(1):7. doi: 10.1186/s13637-017-0060-6. Epub 2017 May 5.

DOI:10.1186/s13637-017-0060-6
PMID:28477208
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5419958/
Abstract

Tagging biomedical entities such as gene, protein, cell, and cell-line is the first step and an important pre-requisite in biomedical literature mining. In this paper, we describe our hybrid named entity tagging approach namely BCC-NER (bidirectional, contextual clues named entity tagger for gene/protein mention recognition). BCC-NER is deployed with three modules. The first module is for text processing which includes basic NLP pre-processing, feature extraction, and feature selection. The second module is for training and model building with bidirectional conditional random fields (CRF) to parse the text in both directions (forward and backward) and integrate the backward and forward trained models using margin-infused relaxed algorithm (MIRA). The third and final module is for post-processing to achieve a better performance, which includes surrounding text features, parenthesis mismatching, and two-tier abbreviation algorithm. The evaluation results on BioCreative II GM test corpus of BCC-NER achieve a precision of 89.95, recall of 84.15 and overall F-score of 86.95, which is higher than the other currently available open source taggers.

摘要

标记生物医学实体,如基因、蛋白质、细胞和细胞系,是生物医学文献挖掘的第一步,也是一个重要的先决条件。在本文中,我们描述了我们的混合命名实体标记方法,即BCC-NER(用于基因/蛋白质提及识别的双向、上下文线索命名实体标记器)。BCC-NER由三个模块组成。第一个模块用于文本处理,包括基本的自然语言处理预处理、特征提取和特征选择。第二个模块用于使用双向条件随机场(CRF)进行训练和模型构建,以双向(向前和向后)解析文本,并使用边际注入松弛算法(MIRA)整合向前和向后训练的模型。第三个也是最后一个模块用于后处理以获得更好的性能,包括周围文本特征、括号不匹配和两层缩写算法。BCC-NER在BioCreative II GM测试语料库上的评估结果达到了89.95的精确率、84.15的召回率和86.95的总体F值,高于其他目前可用的开源标记器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e7c/5419958/aba8223892c1/13637_2017_60_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e7c/5419958/aba8223892c1/13637_2017_60_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9e7c/5419958/aba8223892c1/13637_2017_60_Fig1_HTML.jpg

相似文献

1
BCC-NER: bidirectional, contextual clues named entity tagger for gene/protein mention recognition.BCC-NER:用于基因/蛋白质提及识别的双向上下文线索命名实体标记器。
EURASIP J Bioinform Syst Biol. 2017 Dec;2017(1):7. doi: 10.1186/s13637-017-0060-6. Epub 2017 May 5.
2
Integrating high dimensional bi-directional parsing models for gene mention tagging.整合用于基因提及标记的高维双向解析模型。
Bioinformatics. 2008 Jul 1;24(13):i286-94. doi: 10.1093/bioinformatics/btn183.
3
Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。
BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.
4
A hybrid named entity tagger for tagging human proteins/genes.一种用于标记人类蛋白质/基因的混合命名实体标记器。
Int J Data Min Bioinform. 2014;10(3):315-28. doi: 10.1504/ijdmb.2014.064545.
5
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
6
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
7
Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.评估医疗保健中的实体识别:实体模型定量研究。
JMIR Med Inform. 2024 Oct 17;12:e59782. doi: 10.2196/59782.
8
A neural network approach to chemical and gene/protein entity recognition in patents.一种用于专利中化学及基因/蛋白质实体识别的神经网络方法。
J Cheminform. 2018 Dec 18;10(1):65. doi: 10.1186/s13321-018-0318-3.
9
D3NER: biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information.D3NER:基于条件随机场-双向长短期记忆网络的生物医学命名实体识别,通过各种语言信息的微调嵌入得到改进。
Bioinformatics. 2018 Oct 15;34(20):3539-3546. doi: 10.1093/bioinformatics/bty356.
10
DTranNER: biomedical named entity recognition with deep learning-based label-label transition model.DTranNER:基于深度学习的标签-标签转换模型的生物医学命名实体识别。
BMC Bioinformatics. 2020 Feb 11;21(1):53. doi: 10.1186/s12859-020-3393-1.

引用本文的文献

1
Dictionary-based matching graph network for biomedical named entity recognition.基于词典匹配图网络的生物医学命名实体识别。
Sci Rep. 2023 Dec 8;13(1):21667. doi: 10.1038/s41598-023-48564-w.
2
Combining Literature Mining and Machine Learning for Predicting Biomedical Discoveries.结合文献挖掘和机器学习预测生物医学发现。
Methods Mol Biol. 2022;2496:123-140. doi: 10.1007/978-1-0716-2305-3_7.
3
Artificial Intelligence and Cardiovascular Genetics.人工智能与心血管遗传学

本文引用的文献

1
A hybrid named entity tagger for tagging human proteins/genes.一种用于标记人类蛋白质/基因的混合命名实体标记器。
Int J Data Min Bioinform. 2014;10(3):315-28. doi: 10.1504/ijdmb.2014.064545.
2
BioC: a minimalist approach to interoperability for biomedical text processing.BioC:一种用于生物医学文本处理的最小互操作方法。
Database (Oxford). 2013 Sep 18;2013:bat064. doi: 10.1093/database/bat064. Print 2013.
3
Representation learning: a review and new perspectives.表示学习:综述与新视角。
Life (Basel). 2022 Feb 14;12(2):279. doi: 10.3390/life12020279.
IEEE Trans Pattern Anal Mach Intell. 2013 Aug;35(8):1798-828. doi: 10.1109/TPAMI.2013.50.
4
Gimli: open source and high-performance biomedical name recognition.金雳:开源的高性能生物医学命名实体识别。
BMC Bioinformatics. 2013 Feb 15;14:54. doi: 10.1186/1471-2105-14-54.
5
Combined SVM-CRFs for biological named entity recognition with maximal bidirectional squeezing.基于最大双向挤压的联合 SVM-CRFs 生物命名实体识别。
PLoS One. 2012;7(6):e39230. doi: 10.1371/journal.pone.0039230. Epub 2012 Jun 26.
6
BioLemmatizer: a lemmatization tool for morphological processing of biomedical text.生物词元化器:一种用于生物医学文本形态处理的词元化工具。
J Biomed Semantics. 2012 Apr 1;3:3. doi: 10.1186/2041-1480-3-3.
7
Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain.在生物医学领域中使用统一医学语言系统概念唯一标识符(CUIs)进行词义消歧。
AMIA Annu Symp Proc. 2007 Oct 11;2007:533-7.
8
Integrating high dimensional bi-directional parsing models for gene mention tagging.整合用于基因提及标记的高维双向解析模型。
Bioinformatics. 2008 Jul 1;24(13):i286-94. doi: 10.1093/bioinformatics/btn183.
9
Exploiting the contextual cues for bio-entity name recognition in biomedical literature.利用上下文线索识别生物医学文献中的生物实体名称。
J Biomed Inform. 2008 Aug;41(4):580-7. doi: 10.1016/j.jbi.2008.01.002. Epub 2008 Jan 11.
10
BANNER: an executable survey of advances in biomedical named entity recognition.横幅:生物医学命名实体识别进展的可执行调查。
Pac Symp Biocomput. 2008:652-63.