CHEMDNER：药物和化学名称提取挑战赛。

CHEMDNER: The drugs and chemical names extraction challenge.

机构信息

Structural Computational Biology Group, Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre, Calle Melchor Fernndez Almagro, 3, Madrid, Spain.

Computational Intelligence Group, Department of Artificial Intelligence, Universidad Politecnica de Madrid, Calle Ramiro de Maeztu, 7, Madrid, Spain.

出版信息

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.

DOI:10.1186/1758-2946-7-S1-S1

PMID:25810766

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4331685/

Abstract

Natural language processing (NLP) and text mining technologies for the chemical domain (ChemNLP or chemical text mining) are key to improve the access and integration of information from unstructured data such as patents or the scientific literature. Therefore, the BioCreative organizers posed the CHEMDNER (chemical compound and drug name recognition) community challenge, which promoted the development of novel, competitive and accessible chemical text mining systems. This task allowed a comparative assessment of the performance of various methodologies using a carefully prepared collection of manually labeled text prepared by specially trained chemists as Gold Standard data. We evaluated two important aspects: one covered the indexing of documents with chemicals (chemical document indexing - CDI task), and the other was concerned with finding the exact mentions of chemicals in text (chemical entity mention recognition - CEM task). 27 teams (23 academic and 4 commercial, a total of 87 researchers) returned results for the CHEMDNER tasks: 26 teams for CEM and 23 for the CDI task. Top scoring teams obtained an F-score of 87.39% for the CEM task and 88.20% for the CDI task, a very promising result when compared to the agreement between human annotators (91%). The strategies used to detect chemicals included machine learning methods (e.g. conditional random fields) using a variety of features, chemistry and drug lexica, and domain-specific rules. We expect that the tools and resources resulting from this effort will have an impact in future developments of chemical text mining applications and will form the basis to find related chemical information for the detected entities, such as toxicological or pharmacogenomic properties.

摘要

自然语言处理（NLP）和化学领域的文本挖掘技术（ChemNLP 或化学文本挖掘）是提高对非结构化数据（如专利或科学文献）中信息的访问和集成的关键。因此，BioCreative 组织者提出了 CHEMDNER（化学化合物和药物名称识别）社区挑战，这促进了新型、有竞争力和易于使用的化学文本挖掘系统的发展。该任务允许使用专门训练的化学家精心准备的手工标记文本的集合，通过比较评估各种方法的性能，作为黄金标准数据。我们评估了两个重要方面：一个涵盖了带有化学物质的文档索引（化学文档索引-CDI 任务），另一个关注于在文本中找到化学物质的确切提及（化学实体提及识别-CEM 任务）。27 个团队（23 个学术团队和 4 个商业团队，共有 87 名研究人员）对 CHEMDNER 任务返回了结果：26 个团队用于 CEM 任务，23 个团队用于 CDI 任务。得分最高的团队在 CEM 任务中获得了 87.39%的 F 分数，在 CDI 任务中获得了 88.20%的 F 分数，与人类注释者之间的一致性（91%）相比，这是一个非常有前途的结果。用于检测化学物质的策略包括使用各种特征、化学和药物词典以及特定于领域的规则的机器学习方法（例如条件随机场）。我们预计，这项工作产生的工具和资源将对未来化学文本挖掘应用的发展产生影响，并为检测到的实体找到相关的化学信息，例如毒理学或药物基因组学特性奠定基础。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3f4e/4331685/f2d484a6f97f/1758-2946-7-S1-S1-1.jpg

相似文献

CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER：药物和化学名称提取挑战赛。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.

A comparison of conditional random fields and structured support vector machines for chemical entity recognition in biomedical literature.条件随机场与结构化支持向量机在生物医学文献中化学实体识别的比较。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S8. doi: 10.1186/1758-2946-7-S1-S8. eCollection 2015.

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.CHEMDNER 系统，混合条件随机场和多尺度词聚类。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S4. doi: 10.1186/1758-2946-7-S1-S4. eCollection 2015.

Chemical named entity recognition in patents by domain knowledge and unsupervised feature learning.基于领域知识和无监督特征学习的专利中化学命名实体识别

Database (Oxford). 2016 Apr 17;2016. doi: 10.1093/database/baw049. Print 2016.

A document processing pipeline for annotating chemical entities in scientific documents.用于在科学文献中标记化学实体的文档处理管道。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S7. doi: 10.1186/1758-2946-7-S1-S7. eCollection 2015.

A CRF-based system for recognizing chemical entity mentions (CEMs) in biomedical literature.基于 CRF 的生物医学文献中化学实体提及识别系统。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S11. doi: 10.1186/1758-2946-7-S1-S11. eCollection 2015.

Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.使用代表性标记方案和细粒度标记化增强化学化合物和药物名称识别。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S14. doi: 10.1186/1758-2946-7-S1-S14. eCollection 2015.

Recognition of chemical entities: combining dictionary-based and grammar-based approaches.化学实体识别：基于词典和基于语法的方法相结合。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S10. doi: 10.1186/1758-2946-7-S1-S10. eCollection 2015.

引用本文的文献

Clinical insights: A comprehensive review of language models in medicine.临床见解：医学领域语言模型的全面综述

PLOS Digit Health. 2025 May 8;4(5):e0000800. doi: 10.1371/journal.pdig.0000800. eCollection 2025 May.

A machine learning driven automated system to extract multiple information fields from safety data sheet documents.一种由机器学习驱动的自动化系统，用于从安全数据表文档中提取多个信息字段。

Heliyon. 2025 Jan 27;11(4):e42215. doi: 10.1016/j.heliyon.2025.e42215. eCollection 2025 Feb 28.

Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用：一种流水线方法。

Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.

Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.词汇很重要：用于酶命名实体识别的标注流水线和四个深度学习算法。

J Proteome Res. 2024 Jun 7;23(6):1915-1925. doi: 10.1021/acs.jproteome.3c00367. Epub 2024 May 11.

Taiyi: a bilingual fine-tuned large language model for diverse biomedical tasks.太乙：一个用于多种生物医学任务的双语精调大型语言模型。

J Am Med Inform Assoc. 2024 Sep 1;31(9):1865-1874. doi: 10.1093/jamia/ocae037.

ShinyTPs: Curating Transformation Products from Text Mining Results.ShinyTPs：从文本挖掘结果中筛选转化产物

Environ Sci Technol Lett. 2023 Sep 29;10(10):865-871. doi: 10.1021/acs.estlett.3c00537. eCollection 2023 Oct 10.

Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19.大规模生物医学关系抽取跨越多种关系类型：COVID-19 的模型开发和可用性研究。

J Med Internet Res. 2023 Sep 20;25:e48115. doi: 10.2196/48115.

Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.全文文章中的化学物质鉴定与标引：NLM-Chem 在 BioCreative VII 挑战赛中的概述

Database (Oxford). 2023 Mar 7;2023. doi: 10.1093/database/baad005.

CafeteriaSA corpus: scientific abstracts annotated across different food semantic resources.自助餐厅 SA 语料库：在不同的食物语义资源中进行标注的科学摘要。

Database (Oxford). 2022 Dec 16;2022. doi: 10.1093/database/baac107.

Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers.挖掘药物不良反应的挑战与机遇：来自制药、监管机构、医疗保健提供者和消费者的观点。

Database (Oxford). 2022 Sep 2;2022. doi: 10.1093/database/baac071.

本文引用的文献

Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications.文本挖掘在药物和化学化合物中的应用：方法、工具和应用。

Mol Inform. 2011 Jun;30(6-7):506-19. doi: 10.1002/minf.201100005. Epub 2011 Jul 12.

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S8. doi: 10.1186/1758-2946-7-S1-S8. eCollection 2015.

A document processing pipeline for annotating chemical entities in scientific documents.用于在科学文献中标记化学实体的文档处理管道。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S7. doi: 10.1186/1758-2946-7-S1-S7. eCollection 2015.

Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics.通过预处理分析、知识丰富的特征和启发式方法优化化学命名实体识别。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S6. doi: 10.1186/1758-2946-7-S1-S6. eCollection 2015.

LeadMine: a grammar and dictionary driven approach to entity recognition.LeadMine：一种基于语法和词典的实体识别方法。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S5. doi: 10.1186/1758-2946-7-S1-S5. eCollection 2015.

CHEMDNER system with mixed conditional random fields and multi-scale word clustering.CHEMDNER 系统，混合条件随机场和多尺度词聚类。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S4. doi: 10.1186/1758-2946-7-S1-S4. eCollection 2015.

tmChem: a high performance approach for chemical named entity recognition and normalization.tmChem：一种用于化学命名实体识别和标准化的高性能方法。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. doi: 10.1186/1758-2946-7-S1-S3. eCollection 2015.

The CHEMDNER corpus of chemicals and drugs and its annotation principles.CHEMDNER 化学物质和药物语料库及其标注原则。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S2. doi: 10.1186/1758-2946-7-S1-S2. eCollection 2015.

CheNER: a tool for the identification of chemical entities and their classes in biomedical literature.CheNER：一个用于在生物医学文献中识别化学实体及其类别的工具。

J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S15. doi: 10.1186/1758-2946-7-S1-S15. eCollection 2015.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

CHEMDNER：药物和化学名称提取挑战赛。

CHEMDNER: The drugs and chemical names extraction challenge.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献