• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

科学文献中药代动力学参数的命名实体识别。

Named entity recognition of pharmacokinetic parameters in the scientific literature.

机构信息

Department of Computer Science, University College London, London, UK.

Institute of Health Informatics, University College London, London, UK.

出版信息

Sci Rep. 2024 Oct 8;14(1):23485. doi: 10.1038/s41598-024-73338-3.

DOI:10.1038/s41598-024-73338-3
PMID:39379460
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11461509/
Abstract

The development of accurate predictions for a new drug's absorption, distribution, metabolism, and excretion profiles in the early stages of drug development is crucial due to high candidate failure rates. The absence of comprehensive, standardised, and updated pharmacokinetic (PK) repositories limits pre-clinical predictions and often requires searching through the scientific literature for PK parameter estimates from similar compounds. While text mining offers promising advancements in automatic PK parameter extraction, accurate Named Entity Recognition (NER) of PK terms remains a bottleneck due to limited resources. This work addresses this gap by introducing novel corpora and language models specifically designed for effective NER of PK parameters. Leveraging active learning approaches, we developed an annotated corpus containing over 4000 entity mentions found across the PK literature on PubMed. To identify the most effective model for PK NER, we fine-tuned and evaluated different NER architectures on our corpus. Fine-tuning BioBERT exhibited the best results, achieving a strict score of 90.37% in recognising PK parameter mentions, significantly outperforming heuristic approaches and models trained on existing corpora. To accelerate the development of end-to-end PK information extraction pipelines and improve pre-clinical PK predictions, the PK NER models and the labelled corpus were released open source at https://github.com/PKPDAI/PKNER .

摘要

由于候选药物的失败率较高,因此在药物开发的早期阶段准确预测新药的吸收、分布、代谢和排泄特征至关重要。缺乏全面、标准化和更新的药代动力学 (PK) 存储库限制了临床前预测,并且经常需要在科学文献中搜索类似化合物的 PK 参数估计值。虽然文本挖掘为自动 PK 参数提取提供了有希望的进展,但由于资源有限,PK 术语的准确命名实体识别 (NER) 仍然是一个瓶颈。这项工作通过引入专门为有效识别 PK 参数的新型语料库和语言模型来解决这一差距。利用主动学习方法,我们在 PubMed 上的 PK 文献中开发了一个包含超过 4000 个实体提及的带注释语料库。为了确定最适合 PK NER 的模型,我们在我们的语料库上微调并评估了不同的 NER 架构。微调后的 BioBERT 表现出最佳结果,在识别 PK 参数提及方面的严格 F1 得分为 90.37%,明显优于启发式方法和基于现有语料库训练的模型。为了加速端到端 PK 信息提取管道的开发并提高临床前 PK 预测,PK NER 模型和标记语料库在 https://github.com/PKPDAI/PKNER 上开源发布。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e7/11461509/a8baca166288/41598_2024_73338_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e7/11461509/c3f954fd2475/41598_2024_73338_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e7/11461509/a8baca166288/41598_2024_73338_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e7/11461509/c3f954fd2475/41598_2024_73338_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f0e7/11461509/a8baca166288/41598_2024_73338_Fig2_HTML.jpg

相似文献

1
Named entity recognition of pharmacokinetic parameters in the scientific literature.科学文献中药代动力学参数的命名实体识别。
Sci Rep. 2024 Oct 8;14(1):23485. doi: 10.1038/s41598-024-73338-3.
2
Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.词汇很重要:用于酶命名实体识别的标注流水线和四个深度学习算法。
J Proteome Res. 2024 Jun 7;23(6):1915-1925. doi: 10.1021/acs.jproteome.3c00367. Epub 2024 May 11.
3
An automated approach to identify scientific publications reporting pharmacokinetic parameters.一种识别报告药代动力学参数的科学出版物的自动化方法。
Wellcome Open Res. 2021 Apr 21;6:88. doi: 10.12688/wellcomeopenres.16718.1. eCollection 2021.
4
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
5
Exploiting and assessing multi-source data for supervised biomedical named entity recognition.利用和评估多源数据进行有监督的生物医学命名实体识别。
Bioinformatics. 2018 Jul 15;34(14):2474-2482. doi: 10.1093/bioinformatics/bty152.
6
An annotated corpus with nanomedicine and pharmacokinetic parameters.一个带有纳米医学和药代动力学参数的注释语料库。
Int J Nanomedicine. 2017 Oct 12;12:7519-7527. doi: 10.2147/IJN.S137117. eCollection 2017.
7
A comparative study of pre-trained language models for named entity recognition in clinical trial eligibility criteria from multiple corpora.基于多语料库的临床试验资格标准中命名实体识别的预训练语言模型的比较研究。
BMC Med Inform Decis Mak. 2022 Sep 6;22(Suppl 3):235. doi: 10.1186/s12911-022-01967-7.
8
Multi-head CRF classifier for biomedical multi-class named entity recognition on Spanish clinical notes.基于多头条件随机场分类器的西班牙语临床文档中生物医学多类命名实体识别。
Database (Oxford). 2024 Jul 30;2024. doi: 10.1093/database/baae068.
9
Drug knowledge discovery via multi-task learning and pre-trained models.通过多任务学习和预训练模型进行药物知识发现。
BMC Med Inform Decis Mak. 2021 Nov 16;21(Suppl 9):251. doi: 10.1186/s12911-021-01614-7.
10
Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach.深度学习架构在增强生物医学关系抽取中的应用:一种流水线方法。
Database (Oxford). 2024 Aug 28;2024. doi: 10.1093/database/baae079.

引用本文的文献

1
Large Language Models and Their Applications in Drug Discovery and Development: A Primer.大语言模型及其在药物发现与开发中的应用:入门指南。
Clin Transl Sci. 2025 Apr;18(4):e70205. doi: 10.1111/cts.70205.
2
An automated classification pipeline for tables in pharmacokinetic literature.药代动力学文献中表格的自动分类流程
Sci Rep. 2025 Mar 24;15(1):10071. doi: 10.1038/s41598-025-94778-5.

本文引用的文献

1
How Much Does It Cost to Research and Develop a New Drug? A Systematic Review and Assessment.研究和开发一种新药需要多少钱?系统评价和评估。
Pharmacoeconomics. 2021 Nov;39(11):1243-1269. doi: 10.1007/s40273-021-01065-y. Epub 2021 Aug 9.
2
HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition.HunFlair:一种用于最先进生物医学命名实体识别的易于使用的工具。
Bioinformatics. 2021 Sep 9;37(17):2792-2794. doi: 10.1093/bioinformatics/btab042.
3
PK-DB: pharmacokinetics database for individualized and stratified computational modeling.
PK-DB:用于个体化和分层计算建模的药代动力学数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1358-D1364. doi: 10.1093/nar/gkaa990.
4
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
5
Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 1352 Drug Compounds.人类 1352 种药物化合物静脉药代动力学参数数据库的趋势分析。
Drug Metab Dispos. 2018 Nov;46(11):1466-1477. doi: 10.1124/dmd.118.082966. Epub 2018 Aug 16.
6
Estimation of clinical trial success rates and related parameters.临床试验成功率及相关参数的估计。
Biostatistics. 2019 Apr 1;20(2):273-286. doi: 10.1093/biostatistics/kxx069.
7
Innovation in the pharmaceutical industry: New estimates of R&D costs.制药行业的创新:研发成本的新估计
J Health Econ. 2016 May;47:20-33. doi: 10.1016/j.jhealeco.2016.01.012. Epub 2016 Feb 12.
8
Extraction of pharmacokinetic evidence of drug-drug interactions from the literature.从文献中提取药物相互作用的药代动力学证据。
PLoS One. 2015 May 11;10(5):e0122199. doi: 10.1371/journal.pone.0122199. eCollection 2015.
9
Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival.药物的传递能否得到改善?提高 II 期生存率的基础药代动力学和药理学原则。
Drug Discov Today. 2012 May;17(9-10):419-24. doi: 10.1016/j.drudis.2011.12.020. Epub 2011 Dec 29.
10
Agreement, the f-measure, and reliability in information retrieval.信息检索中的一致性、F值与可靠性。
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8. doi: 10.1197/jamia.M1733. Epub 2005 Jan 31.