Suppr超能文献

结合SMILES信息预测人体药代动力学参数。

Prediction of human pharmacokinetic parameters incorporating SMILES information.

作者信息

Kwon Jae-Hee, Han Ja-Young, Kim Minjung, Kim Seong Kyung, Lee Dong-Kyu, Kim Myeong Gyu

机构信息

Graduate School of Pharmaceutical Sciences, Ewha Womans University, Seoul, 03760, Republic of Korea.

College of Pharmacy, Chung-Ang University, Seoul, 06974, Republic of Korea.

出版信息

Arch Pharm Res. 2024 Dec;47(12):914-923. doi: 10.1007/s12272-024-01520-2. Epub 2024 Nov 26.

Abstract

This study aimed to develop a model incorporating natural language processing analysis for the simplified molecular-input line-entry system (SMILES) to predict clearance (CL) and volume of distribution at steady state (V) in humans. The construction of CL and V prediction models involved data from 435 to 439 compounds, respectively. In machine learning, features such as animal pharmacokinetic data, in vitro experimental data, molecular descriptors, and SMILES were utilized, with XGBoost employed as the algorithm. The ChemBERTa model was used to analyze substance SMILES, and the last hidden layer embedding of ChemBERTa was examined as a feature. The model was evaluated using geometric mean fold error (GMFE), r, root mean squared error (RMSE), and accuracy within 2- and 3-fold error. The model demonstrated optimal performance for CL prediction when incorporating animal pharmacokinetic data, in vitro experimental data, and SMILES as features, yielding a GMFE of 1.768, an r of 0.528, an RMSE of 0.788, with accuracies within 2-fold and 3-fold error reaching 75.8% and 81.8%, respectively. The model's performance in V prediction was optimized by leveraging animal pharmacokinetic data and in vitro experimental data as features, yielding a GMFE of 1.401, an r of 0.902, an RMSE of 0.413, with accuracies within 2-fold and 3-fold error reaching 93.8% and 100%, respectively. This study has developed a highly predictive model for CL and V. Specifically, incorporating SMILES information into the model has predictive power for CL.

摘要

本研究旨在开发一种结合自然语言处理分析的简化分子输入线性输入系统(SMILES)模型,以预测人体中的清除率(CL)和稳态分布容积(V)。CL和V预测模型的构建分别涉及435至439种化合物的数据。在机器学习中,利用了动物药代动力学数据、体外实验数据、分子描述符和SMILES等特征,并采用XGBoost作为算法。使用ChemBERTa模型分析物质的SMILES,并将ChemBERTa的最后一个隐藏层嵌入作为一个特征进行考察。使用几何平均倍数误差(GMFE)、r、均方根误差(RMSE)以及2倍和3倍误差范围内的准确率对模型进行评估。当将动物药代动力学数据、体外实验数据和SMILES作为特征纳入时,该模型在CL预测方面表现出最佳性能,GMFE为1.768,r为0.528,RMSE为0.788,2倍和3倍误差范围内的准确率分别达到75.8%和81.8%。通过将动物药代动力学数据和体外实验数据作为特征,优化了该模型在V预测方面的性能,GMFE为1.401,r为0.902,RMSE为0.413,2倍和3倍误差范围内的准确率分别达到93.8%和100%。本研究开发了一种对CL和V具有高度预测性的模型。具体而言,将SMILES信息纳入模型对CL具有预测能力。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验