• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用深度学习和启发式方法在 PubMed 全文文章中进行化学物质的识别和标引。

Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.

机构信息

Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Aveiro, Portugal.

Department of Information and Communications Technologies, University of A Coruña, A Coruña, Spain.

出版信息

Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac047.

DOI:10.1093/database/baac047
PMID:35776534
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9248917/
Abstract

The identification of chemicals in articles has attracted a large interest in the biomedical scientific community, given its importance in drug development research. Most of previous research have focused on PubMed abstracts, and further investigation using full-text documents is required because these contain additional valuable information that must be explored. The manual expert task of indexing Medical Subject Headings (MeSH) terms to these articles later helps researchers find the most relevant publications for their ongoing work. The BioCreative VII NLM-Chem track fostered the development of systems for chemical identification and indexing in PubMed full-text articles. Chemical identification consisted in identifying the chemical mentions and linking these to unique MeSH identifiers. This manuscript describes our participation system and the post-challenge improvements we made. We propose a three-stage pipeline that individually performs chemical mention detection, entity normalization and indexing. Regarding chemical identification, we adopted a deep-learning solution that utilizes the PubMedBERT contextualized embeddings followed by a multilayer perceptron and a conditional random field tagging layer. For the normalization approach, we use a sieve-based dictionary filtering followed by a deep-learning similarity search strategy. Finally, for the indexing we developed rules for identifying the more relevant MeSH codes for each article. During the challenge, our system obtained the best official results in the normalization and indexing tasks despite the lower performance in the chemical mention recognition task. In a post-contest phase we boosted our results by improving our named entity recognition model with additional techniques. The final system achieved 0.8731, 0.8275 and 0.4849 in the chemical identification, normalization and indexing tasks, respectively. The code to reproduce our experiments and run the pipeline is publicly available. Database URL https://github.com/bioinformatics-ua/biocreativeVII_track2.

摘要

在生物医药科学界,物品中化学物质的识别引起了极大的兴趣,因为这对于药物开发研究非常重要。之前的大多数研究都集中在 PubMed 摘要上,需要进一步使用全文文档进行调查,因为这些文档包含了必须探索的额外有价值的信息。后来,将医学主题词(MeSH)术语手动索引到这些文章中,有助于研究人员找到与其正在进行的工作最相关的出版物。BioCreative VII NLM-Chem 轨道促进了开发用于识别和索引 PubMed 全文文章中化学物质的系统。化学物质的识别包括识别化学物质的提及,并将这些提及与唯一的 MeSH 标识符联系起来。本文描述了我们的参与系统以及我们在挑战后所做的改进。我们提出了一个三阶段的管道,分别执行化学物质提及检测、实体标准化和索引。关于化学物质的识别,我们采用了一种深度学习解决方案,该解决方案利用了 PubMedBERT 的上下文嵌入,然后是多层感知机和条件随机场标记层。对于归一化方法,我们使用基于筛子的字典过滤,然后是深度学习相似性搜索策略。最后,对于索引,我们为每个文章开发了识别更相关 MeSH 代码的规则。在挑战期间,尽管在化学物质识别任务中的表现较低,但我们的系统在归一化和索引任务中获得了最佳的官方结果。在竞赛之后的阶段,我们通过使用其他技术改进我们的命名实体识别模型,提高了我们的结果。最终系统在化学识别、归一化和索引任务中的得分为 0.8731、0.8275 和 0.4849。可重现我们实验和运行管道的代码可在 https://github.com/bioinformatics-ua/biocreativeVII_track2 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/c9a47ad6cf08/baac047f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/c3214e4afa44/baac047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/2989b40990a5/baac047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/882531beea7e/baac047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/1d61d61b5523/baac047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/9f1c9d0b12c4/baac047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/9c70baf371d3/baac047f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/cada270f9009/baac047f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/6f32bae95ca5/baac047f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/c9a47ad6cf08/baac047f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/c3214e4afa44/baac047f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/2989b40990a5/baac047f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/882531beea7e/baac047f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/1d61d61b5523/baac047f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/9f1c9d0b12c4/baac047f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/9c70baf371d3/baac047f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/cada270f9009/baac047f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/6f32bae95ca5/baac047f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ace7/9248917/c9a47ad6cf08/baac047f9.jpg

相似文献

1
Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.使用深度学习和启发式方法在 PubMed 全文文章中进行化学物质的识别和标引。
Database (Oxford). 2022 Jul 1;2022. doi: 10.1093/database/baac047.
2
Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.全文文章中的化学物质鉴定与标引:NLM-Chem 在 BioCreative VII 挑战赛中的概述
Database (Oxford). 2023 Mar 7;2023. doi: 10.1093/database/baad005.
3
NLM-Chem-BC7: manually annotated full-text resources for chemical entity annotation and indexing in biomedical articles.NLM-Chem-BC7:用于生物医学文章中化学实体注释和索引的人工标注全文资源。
Database (Oxford). 2022 Dec 1;2022. doi: 10.1093/database/baac102.
4
Full-text chemical identification with improved generalizability and tagging consistency.全文化学物质识别,具有更好的泛化能力和标签一致性。
Database (Oxford). 2022 Sep 28;2022. doi: 10.1093/database/baac074.
5
A document processing pipeline for annotating chemical entities in scientific documents.用于在科学文献中标记化学实体的文档处理管道。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S7. doi: 10.1186/1758-2946-7-S1-S7. eCollection 2015.
6
A BERT-based ensemble learning approach for the BioCreative VII challenges: full-text chemical identification and multi-label classification in PubMed articles.基于 BERT 的集成学习方法在 BioCreative VII 挑战赛中的应用:PubMed 文章中的全文化学物质识别和多标签分类。
Database (Oxford). 2022 Jul 15;2022. doi: 10.1093/database/baac056.
7
BioCreative V CDR task corpus: a resource for chemical disease relation extraction.生物创意V化学疾病关系提取任务语料库:化学疾病关系提取的资源。
Database (Oxford). 2016 May 9;2016. doi: 10.1093/database/baw068. Print 2016.
8
MeSH Now: automatic MeSH indexing at PubMed scale via learning to rank.医学主题词表现状:通过学习排序实现PubMed规模的自动医学主题词表索引编制。
J Biomed Semantics. 2017 Apr 17;8(1):15. doi: 10.1186/s13326-017-0123-3.
9
LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.LSTMVoter:使用序列标注工具集合进行化学命名实体识别。
J Cheminform. 2019 Jan 10;11(1):3. doi: 10.1186/s13321-018-0327-2.
10
FullMeSH: improving large-scale MeSH indexing with full text.全文 MeSH:利用全文提高大规模 MeSH 标引的质量。
Bioinformatics. 2020 Mar 1;36(5):1533-1541. doi: 10.1093/bioinformatics/btz756.

引用本文的文献

1
Artificial Intelligence-assisted Biomedical Literature Knowledge Synthesis to Support Decision-making in Precision Oncology.人工智能辅助生物医学文献知识综合以支持精准肿瘤学决策。
AMIA Annu Symp Proc. 2025 May 22;2024:513-522. eCollection 2024.
2
Towards discovery: an end-to-end system for uncovering novel biomedical relations.探索之路:一个端到端的系统,用于揭示新的生物医学关系。
Database (Oxford). 2024 Jul 11;2024. doi: 10.1093/database/baae057.
3
BELB: a biomedical entity linking benchmark.BELB:一个生物医学实体链接基准。

本文引用的文献

1
Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets.利用语义类型预测和大规模数据集提高全面的医学实体链接。
J Biomed Inform. 2021 Sep;121:103880. doi: 10.1016/j.jbi.2021.103880. Epub 2021 Aug 12.
2
Biomedical and clinical English model packages for the Stanza Python NLP library.适用于Stanza Python自然语言处理库的生物医学和临床英语模型包。
J Am Med Inform Assoc. 2021 Aug 13;28(9):1892-1899. doi: 10.1093/jamia/ocab090.
3
NewsMeSH: A new classifier designed to annotate health news with MeSH headings.
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad698.
新闻 MeSH:一种新的分类器,旨在用 MeSH 标题对健康新闻进行标注。
Artif Intell Med. 2021 Apr;114:102053. doi: 10.1016/j.artmed.2021.102053. Epub 2021 Mar 13.
4
NLM-Chem, a new resource for chemical entity recognition in PubMed full text literature.NLM-Chem,一个用于 PubMed 全文文献中化学实体识别的新资源。
Sci Data. 2021 Mar 25;8(1):91. doi: 10.1038/s41597-021-00875-1.
5
Clinical Term Normalization Using Learned Edit Patterns and Subconcept Matching: System Development and Evaluation.使用学习到的编辑模式和子概念匹配进行临床术语标准化:系统开发与评估
JMIR Med Inform. 2021 Jan 14;9(1):e23104. doi: 10.2196/23104.
6
Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets.医学概念规范化中的歧义:电子健康记录数据集的类型和覆盖范围分析。
J Am Med Inform Assoc. 2021 Mar 1;28(3):516-532. doi: 10.1093/jamia/ocaa269.
7
Comparative Toxicogenomics Database (CTD): update 2021.比较毒理学基因组学数据库(CTD):2021 年更新。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1138-D1143. doi: 10.1093/nar/gkaa891.
8
Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking.临床概念规范化的混合自然语言处理系统,结合多层次匹配和机器学习排序。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1576-1584. doi: 10.1093/jamia/ocaa155.
9
BERTMeSH: deep contextual representation learning for large-scale high-performance MeSH indexing with full text.BERTMeSH:基于深度上下文表示学习的大规模高性能 MeSH 索引与全文检索
Bioinformatics. 2021 May 5;37(5):684-692. doi: 10.1093/bioinformatics/btaa837.
10
The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records.2019 年全国自然语言处理(NLP)临床挑战(n2c2)/开放健康自然语言处理(OHNLP)临床记录临床概念规范化共享任务。
J Am Med Inform Assoc. 2020 Oct 1;27(10):1529-1537. doi: 10.1093/jamia/ocaa106.