药代动力学文献中表格的自动分类流程

An automated classification pipeline for tables in pharmacokinetic literature.

作者信息

Smith Victoria C, Gonzalez Hernandez Ferran, Wattanakul Thanaporn, Chotsiri Palang, Cordero José Antonio, Ballester Maria Rosa, Duran Màrius, Fanlo Escudero Olga, Lilaonitkul Watjana, Standing Joseph F, Kloprogge Frank

机构信息

Institute of Health Informatics, University College London, London, UK.

Great Ormond Street Institute for Child Health, University College London, London, UK.

出版信息

Sci Rep. 2025 Mar 24;15(1):10071. doi: 10.1038/s41598-025-94778-5.

DOI:10.1038/s41598-025-94778-5

PMID:40128567

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11933424/

Abstract

Pharmacokinetic (PK) models are essential for optimising drug candidate selection and dosing regimens in drug development. Preclinical and population PK models benefit from integrating prior knowledge from existing compounds. While tables in scientific literature contain comprehensive prior PK data and critical contextual information, the lack of automated extraction tools forces researchers to manually curate datasets, limiting efficiency and scalability. This study addresses this gap by focusing on the crucial first step of PK table mining: automatically identifying tables containing in vivo PK parameters and study population characteristics. To this end, an expert-annotated corpus of 2640 tables from PK literature was developed and used to train a supervised classification pipeline. The pipeline integrates diverse table features and representations, with GPT-4 refining predictions in uncertain cases. The resulting model achieved F1 scores exceeding 96% across all classes. The pipeline was applied to PK papers from PubMed Central Open-Access, with results integrated into the PK paper search tool at www.pkpdai.com . This work establishes a foundational step towards automating PK table data extraction and streamlining dataset curation. The corpus and code are openly available.

摘要

药代动力学（PK）模型对于优化药物研发中的候选药物选择和给药方案至关重要。临床前和群体PK模型受益于整合现有化合物的先验知识。虽然科学文献中的表格包含全面的先验PK数据和关键的背景信息，但缺乏自动化提取工具迫使研究人员手动整理数据集，限制了效率和可扩展性。本研究通过关注PK表格挖掘的关键第一步来解决这一差距：自动识别包含体内PK参数和研究人群特征的表格。为此，开发了一个来自PK文献的2640个表格的专家注释语料库，并用于训练一个监督分类管道。该管道整合了各种表格特征和表示形式，在不确定的情况下由GPT-4优化预测。所得模型在所有类别上的F1分数超过96%。该管道应用于来自PubMed Central开放获取的PK论文，结果整合到www.pkpdai.com的PK论文搜索工具中。这项工作为实现PK表格数据提取自动化和简化数据集整理奠定了基础。语料库和代码可公开获取。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

药代动力学文献中表格的自动分类流程

An automated classification pipeline for tables in pharmacokinetic literature.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

药代动力学文献中表格的自动分类流程

An automated classification pipeline for tables in pharmacokinetic literature.

作者信息

机构信息

出版信息

相似文献

本文引用的文献