Suppr超能文献

机器学习在制药领域中对环境化学物质与人血浆蛋白结合的预测:适用范围与预测局限性

Informing the Human Plasma Protein Binding of Environmental Chemicals by Machine Learning in the Pharmaceutical Space: Applicability Domain and Limits of Predictability.

作者信息

Ingle Brandall L, Veber Brandon C, Nichols John W, Tornero-Velez Rogelio

机构信息

U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory , Research Triangle Park, North Carolina 27709, United States.

U.S. Environmental Protection Agency, Office of Research and Development, National Health Exposure Effects Research Laboratory , Duluth, Minnesota 55804, United States.

出版信息

J Chem Inf Model. 2016 Nov 28;56(11):2243-2252. doi: 10.1021/acs.jcim.6b00291. Epub 2016 Nov 3.

Abstract

The free fraction of a xenobiotic in plasma (F) is an important determinant of chemical adsorption, distribution, metabolism, elimination, and toxicity, yet experimental plasma protein binding data are scarce for environmentally relevant chemicals. The presented work explores the merit of utilizing available pharmaceutical data to predict F for environmentally relevant chemicals via machine learning techniques. Quantitative structure-activity relationship (QSAR) models were constructed with k nearest neighbors (kNN), support vector machines (SVM), and random forest (RF) machine learning algorithms from a training set of 1045 pharmaceuticals. The models were then evaluated with independent test sets of pharmaceuticals (200 compounds) and environmentally relevant ToxCast chemicals (406 total, in two groups of 238 and 168 compounds). The selection of a minimal feature set of 10-15 2D molecular descriptors allowed for both informative feature interpretation and practical applicability domain assessment via a bounded box of descriptor ranges and principal component analysis. The diverse pharmaceutical and environmental chemical sets exhibit similarities in terms of chemical space (99-82% overlap), as well as comparable bias and variance in constructed learning curves. All the models exhibit significant predictability with mean absolute errors (MAE) in the range of 0.10-0.18F. The models performed best for highly bound chemicals (MAE 0.07-0.12), neutrals (MAE 0.11-0.14), and acids (MAE 0.14-0.17). A consensus model had the highest accuracy across both pharmaceuticals (MAE 0.151-0.155) and environmentally relevant chemicals (MAE 0.110-0.131). The inclusion of the majority of the ToxCast test sets within the AD of the consensus model, coupled with high prediction accuracy for these chemicals, indicates the model provides a QSAR for F that is broadly applicable to both pharmaceuticals and environmentally relevant chemicals.

摘要

外源性物质在血浆中的游离分数(F)是化学物质吸附、分布、代谢、消除和毒性的重要决定因素,然而,对于与环境相关的化学物质,实验性血浆蛋白结合数据却很匮乏。本文的研究工作探索了利用现有药物数据通过机器学习技术预测与环境相关化学物质的F值的价值。使用k近邻(kNN)、支持向量机(SVM)和随机森林(RF)机器学习算法,从1045种药物的训练集中构建了定量构效关系(QSAR)模型。然后,使用独立的药物测试集(200种化合物)和与环境相关的ToxCast化学物质(共406种,分为两组,分别为238种和168种化合物)对模型进行评估。选择10 - 15个二维分子描述符的最小特征集,通过描述符范围的有界框和主成分分析,既可以进行信息性特征解释,又可以进行实际适用域评估。不同的药物和环境化学物质集在化学空间方面表现出相似性(重叠率为99 - 82%),并且在构建的学习曲线中具有可比的偏差和方差。所有模型均表现出显著的可预测性,平均绝对误差(MAE)在0.10 - 0.18F范围内。这些模型对高结合化学物质(MAE为0.07 - 0.12)、中性物质(MAE为0.11 - 0.14)和酸性物质(MAE为0.14 - 0.17)的预测效果最佳。一个共识模型在药物(MAE为0.151 - 0.155)和与环境相关的化学物质(MAE为0.110 - 0.131)方面均具有最高的准确性。将大多数ToxCast测试集纳入共识模型的适用域内,再加上对这些化学物质的高预测准确性,表明该模型为F值提供了一个广泛适用于药物和与环境相关化学物质的QSAR。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验