一种识别报告药代动力学参数的科学出版物的自动化方法。

An automated approach to identify scientific publications reporting pharmacokinetic parameters.

作者信息

Gonzalez Hernandez Ferran, Carter Simon J, Iso-Sipilä Juha, Goldsmith Paul, Almousa Ahmed A, Gastine Silke, Lilaonitkul Watjana, Kloprogge Frank, Standing Joseph F

机构信息

CoMPLEX, University College London, London, UK.

The Alan Turing Institute, London, UK.

出版信息

Wellcome Open Res. 2021 Apr 21;6:88. doi: 10.12688/wellcomeopenres.16718.1. eCollection 2021.

DOI:10.12688/wellcomeopenres.16718.1

PMID:34381873

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8343403/

Abstract

Pharmacokinetic (PK) predictions of new chemical entities are aided by prior knowledge from other compounds. The development of robust algorithms that improve preclinical and clinical phases of drug development remains constrained by the need to search, curate and standardise PK information across the constantly-growing scientific literature. The lack of centralised, up-to-date and comprehensive repositories of PK data represents a significant limitation in the drug development pipeline.In this work, we propose a machine learning approach to automatically identify and characterise scientific publications reporting PK parameters from in vivo data, providing a centralised repository of PK literature. A dataset of 4,792 PubMed publications was labelled by field experts depending on whether in vivo PK parameters were estimated in the study. Different classification pipelines were compared using a bootstrap approach and the best-performing architecture was used to develop a comprehensive and automatically-updated repository of PK publications. The best-performing architecture encoded documents using unigram features and mean pooling of BioBERT embeddings obtaining an F1 score of 83.8% on the test set. The pipeline retrieved over 121K PubMed publications in which in vivo PK parameters were estimated and it was scheduled to perform weekly updates on newly published articles. All the relevant documents were released through a publicly available web interface (https://app.pkpdai.com) and characterised by the drugs, species and conditions mentioned in the abstract, to facilitate the subsequent search of relevant PK data. This automated, open-access repository can be used to accelerate the search and comparison of PK results, curate ADME datasets, and facilitate subsequent text mining tasks in the PK domain.

摘要

新化学实体的药代动力学（PK）预测可借助其他化合物的先验知识。开发强大的算法以改善药物研发的临床前和临床阶段，仍受到在不断增长的科学文献中搜索、整理和标准化PK信息需求的限制。缺乏集中、最新且全面的PK数据储存库是药物研发流程中的一个重大限制。在这项工作中，我们提出一种机器学习方法，用于自动识别和表征报告体内数据PK参数的科学出版物，提供一个PK文献的集中储存库。一个包含4792篇PubMed出版物的数据集由领域专家根据研究中是否估计了体内PK参数进行标注。使用自助法比较了不同的分类流程，并使用性能最佳的架构开发了一个全面且自动更新的PK出版物储存库。性能最佳的架构使用单字特征和BioBERT嵌入的平均池化对文档进行编码，在测试集上获得了83.8%的F1分数。该流程检索了超过12.1万篇估计了体内PK参数的PubMed出版物，并计划对新发表的文章进行每周更新。所有相关文档通过一个公开可用的网络界面（https://app.pkpdai.com）发布，并以摘要中提到的药物、物种和条件为特征，以方便后续搜索相关的PK数据。这个自动化的开放获取储存库可用于加速PK结果的搜索和比较、整理ADME数据集，并促进PK领域后续的文本挖掘任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6c3b/8343403/7c7c30e1f799/wellcomeopenres-6-18437-g0000.jpg

相似文献

An automated approach to identify scientific publications reporting pharmacokinetic parameters.

Wellcome Open Res. 2021 Apr 21;6:88. doi: 10.12688/wellcomeopenres.16718.1. eCollection 2021.

An automated classification pipeline for tables in pharmacokinetic literature.

Sci Rep. 2025 Mar 24;15(1):10071. doi: 10.1038/s41598-025-94778-5.

Named entity recognition of pharmacokinetic parameters in the scientific literature.

Sci Rep. 2024 Oct 8;14(1):23485. doi: 10.1038/s41598-024-73338-3.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

BioConceptVec: Creating and evaluating literature-based biomedical concept embeddings on a large scale.

PLoS Comput Biol. 2020 Apr 23;16(4):e1007617. doi: 10.1371/journal.pcbi.1007617. eCollection 2020 Apr.

Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition.

J Proteome Res. 2024 Jun 7;23(6):1915-1925. doi: 10.1021/acs.jproteome.3c00367. Epub 2024 May 11.

An Automated Customizable Live Web Crawler for Curation of Comparative Pharmacokinetic Data: An Intelligent Compilation of Research-Based Comprehensive Article Repository.

Pharmaceutics. 2023 Apr 30;15(5):1384. doi: 10.3390/pharmaceutics15051384.

BioReader: a text mining tool for performing classification of biomedical literature.

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.

A comparison of word embeddings for the biomedical natural language processing.

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

LitSuggest: a web-based system for literature recommendation and curation using machine learning.

Nucleic Acids Res. 2021 Jul 2;49(W1):W352-W358. doi: 10.1093/nar/gkab326.

引用本文的文献

The dawn of a new era: can machine learning and large language models reshape QSP modeling?

J Pharmacokinet Pharmacodyn. 2025 Jun 16;52(4):36. doi: 10.1007/s10928-025-09984-5.

An automated classification pipeline for tables in pharmacokinetic literature.

Sci Rep. 2025 Mar 24;15(1):10071. doi: 10.1038/s41598-025-94778-5.

Intradiscal pharmacokinetics of oral antibiotics to treat Chronic Lower Back Pain.

NPJ Antimicrob Resist. 2023 May 10;1(1):1. doi: 10.1038/s44259-023-00002-7.

Systematic evaluation of high-throughput PBK modelling strategies for the prediction of intravenous and oral pharmacokinetics in humans.

Arch Toxicol. 2024 Aug;98(8):2659-2676. doi: 10.1007/s00204-024-03764-9. Epub 2024 May 9.

Large language models assisted multi-effect variants mining on cerebral cavernous malformation familial whole genome sequencing.

Comput Struct Biotechnol J. 2024 Feb 1;23:843-858. doi: 10.1016/j.csbj.2024.01.014. eCollection 2024 Dec.

Establishment and Evaluation of a Parametric Population Pharmacokinetic Model Repository for Ganciclovir and Valganciclovir.

Pharmaceutics. 2023 Jun 23;15(7):1801. doi: 10.3390/pharmaceutics15071801.

Physiologically based pharmacokinetic (PBPK) modeling of the role of CYP2D6 polymorphism for metabolic phenotyping with dextromethorphan.

Front Pharmacol. 2022 Oct 24;13:1029073. doi: 10.3389/fphar.2022.1029073. eCollection 2022.

Pharmacokinetics of Caffeine: A Systematic Analysis of Reported Data for Application in Metabolic Phenotyping and Liver Function Testing.

Front Pharmacol. 2022 Feb 25;12:752826. doi: 10.3389/fphar.2021.752826. eCollection 2021.

本文引用的文献

PK-DB: pharmacokinetics database for individualized and stratified computational modeling.

Nucleic Acids Res. 2021 Jan 8;49(D1):D1358-D1364. doi: 10.1093/nar/gkaa990.

Improving the Accuracy of Predicted Human Pharmacokinetics: Lessons Learned from the AstraZeneca Drug Pipeline Over Two Decades.

Trends Pharmacol Sci. 2020 Jun;41(6):390-408. doi: 10.1016/j.tips.2020.03.004. Epub 2020 Apr 28.

Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009-2018.

JAMA. 2020 Mar 3;323(9):844-853. doi: 10.1001/jama.2020.1166.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Trend Analysis of a Database of Intravenous Pharmacokinetic Parameters in Humans for 1352 Drug Compounds.

Drug Metab Dispos. 2018 Nov;46(11):1466-1477. doi: 10.1124/dmd.118.082966. Epub 2018 Aug 16.

XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction.

IEEE Trans Nanobioscience. 2018 Jul;17(3):243-250. doi: 10.1109/TNB.2018.2842219. Epub 2018 May 31.

An Evolutionary Search Algorithm for Covariate Models in Population Pharmacokinetic Analysis.

J Pharm Sci. 2017 Sep;106(9):2407-2411. doi: 10.1016/j.xphs.2017.04.029. Epub 2017 Apr 25.

Good Practices in Model-Informed Drug Discovery and Development: Practice, Application, and Documentation.

CPT Pharmacometrics Syst Pharmacol. 2016 Mar;5(3):93-122. doi: 10.1002/psp4.12049. Epub 2016 Mar 14.

Activity, assay and target data curation and quality in the ChEMBL database.

J Comput Aided Mol Des. 2015 Sep;29(9):885-96. doi: 10.1007/s10822-015-9860-5. Epub 2015 Jul 23.

An analysis of the attrition of drug candidates from four major pharmaceutical companies.

Nat Rev Drug Discov. 2015 Jul;14(7):475-86. doi: 10.1038/nrd4609. Epub 2015 Jun 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种识别报告药代动力学参数的科学出版物的自动化方法。

An automated approach to identify scientific publications reporting pharmacokinetic parameters.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献