• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用词向量将领域知识融入化学和生物医学命名实体识别。

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

机构信息

Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Cheongju, South Korea.

出版信息

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.

DOI:10.1186/1758-2946-7-S1-S9
PMID:25810780
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4331699/
Abstract

BACKGROUND

Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data.

RESULTS

We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.

摘要

背景

在对生化文本数据进行有效文本挖掘之前,化学和生物医学命名实体识别(NER)是一项必不可少的前提任务。由于生物医学文献数量的最近增长,利用未标记的文本数据来提高系统性能一直是文本挖掘中的一个活跃而具有挑战性的研究课题。我们提出了一种半监督学习方法,该方法可以有效地利用未标记的数据,将领域知识纳入命名实体识别模型,并提高系统性能。所提出的方法包括文本预处理的自然语言处理(NLP)任务、从大量文本数据中学习单词表示特征以进行特征提取,以及用于标记分类的条件随机场。除了领域中的自由文本之外,该方法不依赖于任何词汇表或字典,以便使系统适用于生物文本数据中的其他 NER 任务。

结果

我们使用所提出的方法扩展了生物医学 NER 系统 BANNER。这产生了一个集成系统,可以应用于化学和药物 NER 或生物医学 NER。我们将 BANNER 系统的这个分支称为 BANNER-CHEMDNER,它可以扩展到数百万个文档,每分钟处理约 530 个文档,可通过 XML 进行配置,并可通过使用 BANNER 非结构化信息管理体系结构(UIMA)接口插入到其他系统中。BANNER-CHEMDNER 在 CHEMDNER 化学实体提及(CEM)和化学文档索引(CDI)子任务的测试集中分别获得了 85.68%和 86.47%的 F 度量,在 BioCreative II 基因提及任务的官方测试集中获得了 87.04%的 F 度量,在化学和生物医学 NER 中都表现出了显著的性能。BANNER-CHEMDNER 系统可在以下网址获得:https://bitbucket.org/tsendeemts/banner-chemdner。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1877/4331699/6920c615be41/1758-2946-7-S1-S9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1877/4331699/6920c615be41/1758-2946-7-S1-S9-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1877/4331699/6920c615be41/1758-2946-7-S1-S9-1.jpg

相似文献

1
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
2
A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.条件随机场与结构化支持向量机在生物医学文献中化学实体识别的比较。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S8. doi: 10.1186/1758-2946-7-S1-S8. eCollection 2015.
3
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
4
Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.基于领域知识和无监督特征学习的专利中化学命名实体识别
Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.
5
The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.
6
Biomedical named entity recognition using deep neural networks with contextual information.基于上下文信息的深度神经网络的生物医学命名实体识别。
BMC Bioinformatics. 2019 Dec 27;20(1):735. doi: 10.1186/s12859-019-3321-4.
7
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
8
CHEMDNER system with mixed conditional random fields and multi-scale word clustering.CHEMDNER 系统,混合条件随机场和多尺度词聚类。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S4. doi: 10.1186/1758-2946-7-S1-S4. eCollection 2015.
9
Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.结合命名实体识别和未知词处理的本体事件抽取的主动学习
J Biomed Semantics. 2016 Apr 27;7:22. doi: 10.1186/s13326-016-0059-z. eCollection 2016.
10
A document processing pipeline for annotating chemical entities in scientific documents.用于在科学文献中标记化学实体的文档处理管道。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S7. doi: 10.1186/1758-2946-7-S1-S7. eCollection 2015.

引用本文的文献

1
Learning adaptive representations for entity recognition in the biomedical domain.学习生物医学领域中实体识别的自适应表示。
J Biomed Semantics. 2021 May 17;12(1):10. doi: 10.1186/s13326-021-00238-0.
2
Named Entity Recognition and Relation Detection for Biomedical Information Extraction.用于生物医学信息提取的命名实体识别与关系检测
Front Cell Dev Biol. 2020 Aug 28;8:673. doi: 10.3389/fcell.2020.00673. eCollection 2020.
3
Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.

本文引用的文献

1
The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.
2
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
3
BioCreative-IV virtual issue.
基于本体的推特消息中医疗命名实体识别的递归神经网络方法。
Int J Environ Res Public Health. 2019 Sep 27;16(19):3628. doi: 10.3390/ijerph16193628.
4
OGER++: hybrid multi-type entity recognition.OGER++:混合多类型实体识别
J Cheminform. 2019 Jan 21;11(1):7. doi: 10.1186/s13321-018-0326-3.
5
Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules.将手工操作搁置一旁:用于化学命名实体识别的高效深度卷积神经网络-循环神经网络架构,无需手工规则。
J Cheminform. 2018 May 23;10(1):28. doi: 10.1186/s13321-018-0280-0.
6
Clinical Relation Extraction Toward Drug Safety Surveillance Using Electronic Health Record Narratives: Classical Learning Versus Deep Learning.利用电子健康记录叙述进行药物安全监测的临床关系提取:经典学习与深度学习
JMIR Public Health Surveill. 2018 Apr 25;4(2):e29. doi: 10.2196/publichealth.9361.
7
Expanding a radiology lexicon using contextual patterns in radiology reports.利用放射科报告中的上下文模式扩展放射学词汇。
J Am Med Inform Assoc. 2018 Jun 1;25(6):679-685. doi: 10.1093/jamia/ocx152.
8
Entity recognition in the biomedical domain using a hybrid approach.使用混合方法进行生物医学领域的实体识别。
J Biomed Semantics. 2017 Nov 9;8(1):51. doi: 10.1186/s13326-017-0157-6.
9
A neural network multi-task learning approach to biomedical named entity recognition.一种用于生物医学命名实体识别的神经网络多任务学习方法。
BMC Bioinformatics. 2017 Aug 15;18(1):368. doi: 10.1186/s12859-017-1776-8.
10
DRABAL: novel method to mine large high-throughput screening assays using Bayesian active learning.DRABAL:一种使用贝叶斯主动学习挖掘大型高通量筛选试验的新方法。
J Cheminform. 2016 Nov 10;8:64. doi: 10.1186/s13321-016-0177-8. eCollection 2016.
生物创意四期虚拟特刊。
Database (Oxford). 2014 May 22;2014. doi: 10.1093/database/bau039. Print 2014.
4
DNorm: disease name normalization with pairwise learning to rank.DNorm:基于对分学习排序的疾病名称标准化。
Bioinformatics. 2013 Nov 15;29(22):2909-17. doi: 10.1093/bioinformatics/btt474. Epub 2013 Aug 21.
5
tmVar: a text mining approach for extracting sequence variants in biomedical literature.tmVar:一种从生物医学文献中提取序列变异的文本挖掘方法。
Bioinformatics. 2013 Jun 1;29(11):1433-9. doi: 10.1093/bioinformatics/btt156. Epub 2013 Apr 5.
6
BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events.BioContext:一个用于大规模提取和语境化生物分子事件的集成文本挖掘系统。
Bioinformatics. 2012 Aug 15;28(16):2154-61. doi: 10.1093/bioinformatics/bts332. Epub 2012 Jun 17.
7
ChemSpot: a hybrid system for chemical named entity recognition.ChemSpot:一种用于化学命名实体识别的混合系统。
Bioinformatics. 2012 Jun 15;28(12):1633-40. doi: 10.1093/bioinformatics/bts183. Epub 2012 Apr 12.
8
BioLemmatizer: a lemmatization tool for morphological processing of biomedical text.生物词元化器:一种用于生物医学文本形态处理的词元化工具。
J Biomed Semantics. 2012 Apr 1;3:3. doi: 10.1186/2041-1480-3-3.
9
Overview of the BioCreative III Workshop.第三届生物创意研讨会概述。
BMC Bioinformatics. 2011 Oct 3;12 Suppl 8(Suppl 8):S1. doi: 10.1186/1471-2105-12-S8-S1.
10
Complex event extraction at PubMed scale.在 PubMed 规模上进行复杂事件抽取。
Bioinformatics. 2010 Jun 15;26(12):i382-90. doi: 10.1093/bioinformatics/btq180.