• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

药代动力学文献中表格的自动分类流程

An automated classification pipeline for tables in pharmacokinetic literature.

作者信息

Smith Victoria C, Gonzalez Hernandez Ferran, Wattanakul Thanaporn, Chotsiri Palang, Cordero José Antonio, Ballester Maria Rosa, Duran Màrius, Fanlo Escudero Olga, Lilaonitkul Watjana, Standing Joseph F, Kloprogge Frank

机构信息

Institute of Health Informatics, University College London, London, UK.

Great Ormond Street Institute for Child Health, University College London, London, UK.

出版信息

Sci Rep. 2025 Mar 24;15(1):10071. doi: 10.1038/s41598-025-94778-5.

DOI:10.1038/s41598-025-94778-5
PMID:40128567
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11933424/
Abstract

Pharmacokinetic (PK) models are essential for optimising drug candidate selection and dosing regimens in drug development. Preclinical and population PK models benefit from integrating prior knowledge from existing compounds. While tables in scientific literature contain comprehensive prior PK data and critical contextual information, the lack of automated extraction tools forces researchers to manually curate datasets, limiting efficiency and scalability. This study addresses this gap by focusing on the crucial first step of PK table mining: automatically identifying tables containing in vivo PK parameters and study population characteristics. To this end, an expert-annotated corpus of 2640 tables from PK literature was developed and used to train a supervised classification pipeline. The pipeline integrates diverse table features and representations, with GPT-4 refining predictions in uncertain cases. The resulting model achieved F1 scores exceeding 96% across all classes. The pipeline was applied to PK papers from PubMed Central Open-Access, with results integrated into the PK paper search tool at www.pkpdai.com . This work establishes a foundational step towards automating PK table data extraction and streamlining dataset curation. The corpus and code are openly available.

摘要

药代动力学(PK)模型对于优化药物研发中的候选药物选择和给药方案至关重要。临床前和群体PK模型受益于整合现有化合物的先验知识。虽然科学文献中的表格包含全面的先验PK数据和关键的背景信息,但缺乏自动化提取工具迫使研究人员手动整理数据集,限制了效率和可扩展性。本研究通过关注PK表格挖掘的关键第一步来解决这一差距:自动识别包含体内PK参数和研究人群特征的表格。为此,开发了一个来自PK文献的2640个表格的专家注释语料库,并用于训练一个监督分类管道。该管道整合了各种表格特征和表示形式,在不确定的情况下由GPT-4优化预测。所得模型在所有类别上的F1分数超过96%。该管道应用于来自PubMed Central开放获取的PK论文,结果整合到www.pkpdai.com的PK论文搜索工具中。这项工作为实现PK表格数据提取自动化和简化数据集整理奠定了基础。语料库和代码可公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72cf/11933424/4623acbe1d53/41598_2025_94778_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72cf/11933424/4623acbe1d53/41598_2025_94778_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72cf/11933424/4623acbe1d53/41598_2025_94778_Fig1_HTML.jpg

相似文献

1
An automated classification pipeline for tables in pharmacokinetic literature.药代动力学文献中表格的自动分类流程
Sci Rep. 2025 Mar 24;15(1):10071. doi: 10.1038/s41598-025-94778-5.
2
An automated approach to identify scientific publications reporting pharmacokinetic parameters.一种识别报告药代动力学参数的科学出版物的自动化方法。
Wellcome Open Res. 2021 Apr 21;6:88. doi: 10.12688/wellcomeopenres.16718.1. eCollection 2021.
3
Named entity recognition of pharmacokinetic parameters in the scientific literature.科学文献中药代动力学参数的命名实体识别。
Sci Rep. 2024 Oct 8;14(1):23485. doi: 10.1038/s41598-024-73338-3.
4
Text mining for drug-drug interaction.药物相互作用的文本挖掘
Methods Mol Biol. 2014;1159:47-75. doi: 10.1007/978-1-4939-0709-0_4.
5
An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study.基于本体增强大语言模型的罕见病知识图谱构建自动端到端系统:开发研究
JMIR Med Inform. 2024 Dec 18;12:e60665. doi: 10.2196/60665.
6
Automatic detection and extraction of key resources from tables in biomedical papers.从生物医学论文表格中自动检测和提取关键资源
BioData Min. 2025 Mar 20;18(1):23. doi: 10.1186/s13040-025-00438-9.
7
An integrated pharmacokinetics ontology and corpus for text mining.一个用于文本挖掘的整合药理学本体和语料库。
BMC Bioinformatics. 2013 Feb 1;14:35. doi: 10.1186/1471-2105-14-35.
8
Towards pathway curation through literature mining--a case study using PharmGKB.通过文献挖掘进行通路编目——以PharmGKB为例的案例研究
Pac Symp Biocomput. 2014:352-63.
9
Extraction of pharmacokinetic evidence of drug-drug interactions from the literature.从文献中提取药物相互作用的药代动力学证据。
PLoS One. 2015 May 11;10(5):e0122199. doi: 10.1371/journal.pone.0122199. eCollection 2015.
10
Leveraging syntactic and semantic graph kernels to extract pharmacokinetic drug drug interactions from biomedical literature.利用句法和语义图核从生物医学文献中提取药代动力学药物相互作用。
BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):67. doi: 10.1186/s12918-016-0311-2.

本文引用的文献

1
Named entity recognition of pharmacokinetic parameters in the scientific literature.科学文献中药代动力学参数的命名实体识别。
Sci Rep. 2024 Oct 8;14(1):23485. doi: 10.1038/s41598-024-73338-3.
2
Automatic text classification of drug-induced liver injury using document-term matrix and XGBoost.使用文档-词矩阵和XGBoost对药物性肝损伤进行自动文本分类
Front Artif Intell. 2024 Jun 3;7:1401810. doi: 10.3389/frai.2024.1401810. eCollection 2024.
3
Pharmacokinetics of Caffeine: A Systematic Analysis of Reported Data for Application in Metabolic Phenotyping and Liver Function Testing.
咖啡因的药代动力学:对用于代谢表型分析和肝功能测试的报告数据的系统分析
Front Pharmacol. 2022 Feb 25;12:752826. doi: 10.3389/fphar.2021.752826. eCollection 2021.
4
An automated approach to identify scientific publications reporting pharmacokinetic parameters.一种识别报告药代动力学参数的科学出版物的自动化方法。
Wellcome Open Res. 2021 Apr 21;6:88. doi: 10.12688/wellcomeopenres.16718.1. eCollection 2021.
5
PK-DB: pharmacokinetics database for individualized and stratified computational modeling.PK-DB:用于个体化和分层计算建模的药代动力学数据库。
Nucleic Acids Res. 2021 Jan 8;49(D1):D1358-D1364. doi: 10.1093/nar/gkaa990.
6
An Accurate In Vitro Prediction of Human VD Based on the Øie-Tozer Equation and Primary Physicochemical Descriptors. 3. Analysis and Assessment of Predictivity on a Large Dataset.基于Øie-Tozer 方程和主要物理化学描述符的人体 VD 的准确体外预测。3. 大数据库中的预测性分析和评估。
Drug Metab Dispos. 2019 Dec;47(12):1380-1387. doi: 10.1124/dmd.119.088914. Epub 2019 Oct 2.
7
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT:一种用于生物医学文本挖掘的预训练生物医学语言表示模型。
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
8
In Silico Prediction of Human Intravenous Pharmacokinetic Parameters with Improved Accuracy.提高准确性的人类静脉内药代动力学参数的计算机预测。
J Chem Inf Model. 2019 Sep 23;59(9):3968-3980. doi: 10.1021/acs.jcim.9b00300. Epub 2019 Aug 26.
9
XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction.XGBFEMF:基于 XGBoost 的必需蛋白预测框架。
IEEE Trans Nanobioscience. 2018 Jul;17(3):243-250. doi: 10.1109/TNB.2018.2842219. Epub 2018 May 31.
10
Estimation of clinical trial success rates and related parameters.临床试验成功率及相关参数的估计。
Biostatistics. 2019 Apr 1;20(2):273-286. doi: 10.1093/biostatistics/kxx069.