• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NERChem:通过全词元特征和具有化学子类组成的命名实体特征,使NERBio适用于化学专利。

NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition.

作者信息

Tsai Richard Tzong-Han, Hsiao Yu-Cheng, Lai Po-Ting

出版信息

Database (Oxford). 2016 Oct 25;2016:baw135. doi: 10.1093/database/baw135.

DOI:10.1093/database/baw135
PMID:31414701
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5091336/
Abstract

Chemical patents contain detailed information on novel chemical compounds that is valuable to the chemical and pharmaceutical industries. In this paper, we introduce a system, NERChem that can recognize chemical named entity mentions in chemical patents. NERChem is based on the conditional random fields model (CRF). Our approach incorporates ( 1 ) class composition, which is used for combining chemical classes whose naming conventions are similar; ( 2 ) BioNE features, which are used for distinguishing chemical mentions from other biomedical NE mentions in the patents; and ( 3 ) full-token word features, which are used to resolve the tokenization granularity problem. We evaluated our approach on the BioCreative V CHEMDNER-patent corpus, and achieved an F-score of 87.17% in the Chemical Entity Mention in Patents (CEMP) task and a sensitivity of 98.58% in the Chemical Passage Detection (CPD) task, ranking alongside the top systems. Database URL: Our NERChem web-based system is publicly available at iisrserv.csie.n cu.edu.tw/nerchem.

摘要

化学专利包含有关新型化合物的详细信息,这些信息对化学和制药行业具有重要价值。在本文中,我们介绍了一种名为NERChem的系统,它能够识别化学专利中提及的化学命名实体。NERChem基于条件随机场模型(CRF)。我们的方法包括:(1)类组合,用于组合命名惯例相似的化学类别;(2)BioNE特征,用于在专利中区分化学提及与其他生物医学命名实体提及;(3)全词元词特征,用于解决词元化粒度问题。我们在BioCreative V CHEMDNER-专利语料库上评估了我们的方法,在专利中的化学实体提及(CEMP)任务中获得了87.17%的F值,在化学段落检测(CPD)任务中获得了98.58%的灵敏度,与顶级系统并列。数据库网址:我们基于网络的NERChem系统可在iisrserv.csie.n cu.edu.tw/nerchem上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/d20558d14a0c/baw135f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/a7e94237669d/baw135f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/2edf22e7e815/baw135f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/d11c773b044b/baw135f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/a2607b78f622/baw135f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/e6a52c6389c6/baw135f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/d20558d14a0c/baw135f6p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/a7e94237669d/baw135f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/2edf22e7e815/baw135f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/d11c773b044b/baw135f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/a2607b78f622/baw135f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/e6a52c6389c6/baw135f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/95e7/5091336/d20558d14a0c/baw135f6p.jpg

相似文献

1
NERChem: adapting NERBio to chemical patents via full-token features and named entity feature with chemical sub-class composition.NERChem:通过全词元特征和具有化学子类组成的命名实体特征,使NERBio适用于化学专利。
Database (Oxford). 2016 Oct 25;2016:baw135. doi: 10.1093/database/baw135.
2
Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.基于领域知识和无监督特征学习的专利中化学命名实体识别
Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.
3
Chemical entity recognition in patents by combining dictionary-based and statistical approaches.通过结合基于词典和统计的方法进行专利中的化学实体识别。
Database (Oxford). 2016 May 2;2016. doi: 10.1093/database/baw061. Print 2016.
4
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
5
A neural network approach to chemical and gene/protein entity recognition in patents.一种用于专利中化学及基因/蛋白质实体识别的神经网络方法。
J Cheminform. 2018 Dec 18;10(1):65. doi: 10.1186/s13321-018-0318-3.
6
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
7
Statistical principle-based approach for gene and protein related object recognition.基于统计原理的基因和蛋白质相关对象识别方法。
J Cheminform. 2018 Dec 17;10(1):64. doi: 10.1186/s13321-018-0314-7.
8
The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.
9
Curatable Named-Entity Recognition Using Semantic Relations.利用语义关系进行可治愈命名实体识别
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):785-92. doi: 10.1109/TCBB.2014.2366770.
10
An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition.基于注意力机制的 BiLSTM-CRF 方法在文档级化学命名实体识别中的应用。
Bioinformatics. 2018 Apr 15;34(8):1381-1388. doi: 10.1093/bioinformatics/btx761.

引用本文的文献

1
Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.基于知识增强的生物医学命名实体识别与规范:在蛋白质和基因上的应用。
BMC Bioinformatics. 2020 Jan 30;21(1):35. doi: 10.1186/s12859-020-3375-3.
2
Using a Large Margin Context-Aware Convolutional Neural Network to Automatically Extract Disease-Disease Association from Literature: Comparative Analytic Study.使用大间隔上下文感知卷积神经网络从文献中自动提取疾病-疾病关联:比较分析研究。
JMIR Med Inform. 2019 Nov 26;7(4):e14502. doi: 10.2196/14502.
3
The extraction of complex relationships and their conversion to biological expression language (BEL) overview of the BioCreative VI (2017) BEL track.

本文引用的文献

1
SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedicine.SimConcept:一种简化生物医学中复合命名实体的混合方法。
ACM BCB. 2014;2014:138-146. doi: 10.1145/2649387.2649420.
2
Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics.通过预处理分析、知识丰富的特征和启发式方法优化化学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S6. doi: 10.1186/1758-2946-7-S1-S6. eCollection 2015.
3
tmChem: a high performance approach for chemical named entity recognition and normalization.
复杂关系的提取及其转化为生物表达语言(BEL)——BioCreative VI(2017) BEL 赛道概述。
Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz084.
4
Statistical principle-based approach for gene and protein related object recognition.基于统计原理的基因和蛋白质相关对象识别方法。
J Cheminform. 2018 Dec 17;10(1):64. doi: 10.1186/s13321-018-0314-7.
tmChem:一种用于化学命名实体识别和标准化的高性能方法。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.
4
CheNER: a tool for the identification of chemical entities and their classes in biomedical literature.CheNER:一个用于在生物医学文献中识别化学实体及其类别的工具。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S15. doi: 10.1186/1758-2946-7-S1-S15. eCollection 2015.
5
Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.使用代表性标记方案和细粒度标记化增强化学化合物和药物名称识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S14. doi: 10.1186/1758-2946-7-S1-S14. eCollection 2015.
6
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
7
The Comparative Toxicogenomics Database's 10th year anniversary: update 2015.比较毒理基因组学数据库成立十周年:2015年更新
Nucleic Acids Res. 2015 Jan;43(Database issue):D914-20. doi: 10.1093/nar/gku935. Epub 2014 Oct 17.
8
Annotated chemical patent corpus: a gold standard for text mining.带注释的化学专利语料库:文本挖掘的黄金标准。
PLoS One. 2014 Sep 30;9(9):e107477. doi: 10.1371/journal.pone.0107477. eCollection 2014.
9
DrugBank 4.0: shedding new light on drug metabolism.DrugBank 4.0:揭示药物代谢的新视角。
Nucleic Acids Res. 2014 Jan;42(Database issue):D1091-7. doi: 10.1093/nar/gkt1068. Epub 2013 Nov 6.
10
tmVar: a text mining approach for extracting sequence variants in biomedical literature.tmVar:一种从生物医学文献中提取序列变异的文本挖掘方法。
Bioinformatics. 2013 Jun 1;29(11):1433-9. doi: 10.1093/bioinformatics/btt156. Epub 2013 Apr 5.