• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

CHEMDNER 系统,混合条件随机场和多尺度词聚类。

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.

机构信息

School of Computer, Wuhan University, Wuhan 430072, China.

School of Public Health, Wuhan University, Wuhan 430072, China.

出版信息

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S4. doi: 10.1186/1758-2946-7-S1-S4. eCollection 2015.

DOI:10.1186/1758-2946-7-S1-S4
PMID:25810775
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4331694/
Abstract

BACKGROUND

The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary.

METHODS

We developed a CHEMDNER system based on mixed conditional random fields (CRF) with word clustering for chemical compound and drug name recognition. For the word clustering, we used Brown's hierarchical algorithm and Skip-gram model based on deep learning with massive PubMed articles including titles and abstracts.

RESULTS

This system achieved the highest F-score of 88.20% for the CDI task and the second highest F-score of 87.11% for the CEM task in BioCreative IV. The performance was further improved by multi-scale clustering based on deep learning, achieving the F-score of 88.71% for CDI and 88.06% for CEM.

CONCLUSIONS

The mixed CRF model represents both the internal complexity and external contexts of the entities, and the model is integrated with word clustering to capture domain knowledge with PubMed articles including titles and abstracts. The domain knowledge helps to ensure the performance of the entity recognition, even without fine-grained linguistic features and manually designed rules.

摘要

背景

化合物和药物名称识别在化学文本挖掘中起着重要作用,是化学信息处理中自动关系抽取和事件识别的基础。因此,需要开发一种高性能的化合物和药物名称命名实体识别系统。

方法

我们开发了一个基于混合条件随机场(CRF)和单词聚类的 CHEMDNER 系统,用于识别化合物和药物名称。对于单词聚类,我们使用了 Brown 的层次算法和基于深度学习的 Skip-gram 模型,利用包含标题和摘要的大量 PubMed 文章。

结果

在 BioCreative IV 中,该系统在 CDI 任务中获得了 88.20%的最高 F1 分数,在 CEM 任务中获得了 87.11%的第二高 F1 分数。通过基于深度学习的多尺度聚类进一步提高了性能,在 CDI 任务中获得了 88.71%的 F1 分数,在 CEM 任务中获得了 88.06%的 F1 分数。

结论

混合 CRF 模型既表示实体的内部复杂性,又表示外部上下文,该模型与单词聚类相结合,利用包含标题和摘要的 PubMed 文章来捕获领域知识。领域知识有助于确保实体识别的性能,即使没有细粒度的语言特征和手动设计的规则。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a25/4331694/7ce0d3c10c27/1758-2946-7-S1-S4-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a25/4331694/b59a904ba38d/1758-2946-7-S1-S4-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a25/4331694/7ce0d3c10c27/1758-2946-7-S1-S4-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a25/4331694/b59a904ba38d/1758-2946-7-S1-S4-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5a25/4331694/7ce0d3c10c27/1758-2946-7-S1-S4-2.jpg

相似文献

1
CHEMDNER system with mixed conditional random fields and multi-scale word clustering.CHEMDNER 系统,混合条件随机场和多尺度词聚类。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S4. doi: 10.1186/1758-2946-7-S1-S4. eCollection 2015.
2
A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.条件随机场与结构化支持向量机在生物医学文献中化学实体识别的比较。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S8. doi: 10.1186/1758-2946-7-S1-S8. eCollection 2015.
3
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
4
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
5
A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature.基于 CRF 的生物医学文献中化学实体提及识别系统。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S11. doi: 10.1186/1758-2946-7-S1-S11. eCollection 2015.
6
Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.基于领域知识和无监督特征学习的专利中化学命名实体识别
Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.
7
Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.使用代表性标记方案和细粒度标记化增强化学化合物和药物名称识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S14. doi: 10.1186/1758-2946-7-S1-S14. eCollection 2015.
8
The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.
9
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
10
A document processing pipeline for annotating chemical entities in scientific documents.用于在科学文献中标记化学实体的文档处理管道。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S7. doi: 10.1186/1758-2946-7-S1-S7. eCollection 2015.

引用本文的文献

1
Next step in the development of mesoprogestins: the preclinical profile of EC313.中孕激素研发的下一步:EC313 的临床前概况。
Front Endocrinol (Lausanne). 2023 Sep 8;14:1201547. doi: 10.3389/fendo.2023.1201547. eCollection 2023.
2
Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study.从出院小结中提取药物名称及相关属性:文本挖掘研究
JMIR Med Inform. 2021 May 5;9(5):e24678. doi: 10.2196/24678.
3
KGHC: a knowledge graph for hepatocellular carcinoma.KGHC:用于肝细胞癌的知识图谱。

本文引用的文献

1
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
2
Overview of BioCreative II gene mention recognition.生物创意II基因提及识别概述。
Genome Biol. 2008;9 Suppl 2(Suppl 2):S2. doi: 10.1186/gb-2008-9-s2-s2. Epub 2008 Sep 1.
3
Identifying gene and protein mentions in text using conditional random fields.使用条件随机场识别文本中的基因和蛋白质提及。
BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):135. doi: 10.1186/s12911-020-1112-5.
4
90 YEARS OF PROGESTERONE: Selective progesterone receptor modulators in gynaecological therapies.90 年的孕激素:妇科治疗中的选择性孕激素受体调节剂。
J Mol Endocrinol. 2020 Jul;65(1):T15-T33. doi: 10.1530/JME-19-0238.
5
Knowledge-enhanced biomedical named entity recognition and normalization: application to proteins and genes.基于知识增强的生物医学命名实体识别与规范:在蛋白质和基因上的应用。
BMC Bioinformatics. 2020 Jan 30;21(1):35. doi: 10.1186/s12859-020-3375-3.
6
Multitask learning for biomedical named entity recognition with cross-sharing structure.基于交叉共享结构的生物医学命名实体识别的多任务学习。
BMC Bioinformatics. 2019 Aug 16;20(1):427. doi: 10.1186/s12859-019-3000-5.
7
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
8
Putting hands to rest: efficient deep CNN-RNN architecture for chemical named entity recognition with no hand-crafted rules.将手工操作搁置一旁:用于化学命名实体识别的高效深度卷积神经网络-循环神经网络架构,无需手工规则。
J Cheminform. 2018 May 23;10(1):28. doi: 10.1186/s13321-018-0280-0.
9
Long short-term memory RNN for biomedical named entity recognition.用于生物医学命名实体识别的长短期记忆循环神经网络
BMC Bioinformatics. 2017 Oct 30;18(1):462. doi: 10.1186/s12859-017-1868-5.
10
Disorder recognition in clinical texts using multi-label structured SVM.使用多标签结构化支持向量机识别临床文本中的病症
BMC Bioinformatics. 2017 Jan 31;18(1):75. doi: 10.1186/s12859-017-1476-4.
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S6. doi: 10.1186/1471-2105-6-S1-S6. Epub 2005 May 24.
4
Exploring the boundaries: gene and protein identification in biomedical text.探索边界:生物医学文本中的基因与蛋白质识别
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S5. doi: 10.1186/1471-2105-6-S1-S5. Epub 2005 May 24.