Suppr超能文献

两种预测雌激素受体结合活性的构效关系模型的预测置信度评估及领域外推

Assessment of prediction confidence and domain extrapolation of two structure-activity relationship models for predicting estrogen receptor binding activity.

作者信息

Tong Weida, Xie Qian, Hong Huixiao, Shi Leming, Fang Hong, Perkins Roger

机构信息

Center for Toxicoinformatics, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas 72079, USA.

出版信息

Environ Health Perspect. 2004 Aug;112(12):1249-54. doi: 10.1289/txg.7125.

Abstract

Quantitative structure-activity relationship (QSAR) methods have been widely applied in drug discovery, lead optimization, toxicity prediction, and regulatory decisions. Despite major advances in algorithms and software, QSAR models have inherent limitations associated with a size and chemical-structure diversity of the training set, experimental error, and many characteristics of structure representation and correlation algorithms. Whereas excellent fit to the training data may be readily attainable, often models fail to predict accurately chemicals that are outside their domain of applicability. A QSAR's utility and, in the case of regulatory decisions, justification for usage increasingly depend on the ability to quantify a model's potential for predicting unknown chemicals with some known degree of certainty. It is never possible to predict an unknown chemical with absolute certainty. Here we report on two QSAR models based on different data sets for classification of chemicals according to their ability to bind to the estrogen receptor. The models were developed by using a novel QSAR method, Decision Forest, which combines the results of multiple heterogeneous but comparable Decision Tree models to produce a consensus prediction. We used an extensive cross-validation process to define an applicability domain for model predictions based on two quantitative measures: prediction confidence and domain extrapolation. Together, these measures quantify the accuracy of each prediction within and outside of the training domain. Despite being based on large and diverse training sets, both QSAR models had poor accuracy for chemicals within the domain of low confidence, whereas good accuracy was obtained for those within the domain of high confidence. For prediction in the high confidence domain, accuracy was inversely proportional to the degree of domain extrapolation. The model with a larger training set of 1,092, compared with 232 for the other, was more accurate in predicting chemicals at larger domain extrapolation, and could be particularly useful for rapidly prioritizing potential endocrine disruptors from large chemical universe.

摘要

定量构效关系(QSAR)方法已广泛应用于药物发现、先导化合物优化、毒性预测及监管决策。尽管算法和软件取得了重大进展,但QSAR模型存在一些固有局限性,这些局限性与训练集的规模和化学结构多样性、实验误差以及结构表示和相关算法的许多特性有关。虽然很容易实现与训练数据的良好拟合,但模型常常无法准确预测其适用范围之外的化学物质。QSAR的实用性以及在监管决策中使用的合理性越来越取决于量化模型以一定已知确定性预测未知化学物质的潜力的能力。永远不可能绝对确定地预测未知化学物质。在此,我们报告了基于不同数据集的两个QSAR模型,用于根据化学物质与雌激素受体结合的能力对其进行分类。这些模型是通过使用一种新颖的QSAR方法——决策森林开发的,该方法结合了多个异构但可比的决策树模型的结果以产生共识预测。我们使用广泛的交叉验证过程,基于两个定量指标定义模型预测的适用范围:预测置信度和范围外推。这些指标共同量化了训练范围内外每个预测的准确性。尽管基于庞大且多样的训练集,但两个QSAR模型对低置信度范围内的化学物质的准确性都较差,而对高置信度范围内的化学物质则获得了良好的准确性。对于高置信度范围内的预测,准确性与范围外推程度成反比。与另一个模型的232个训练集相比,具有1092个较大训练集的模型在预测更大范围外推的化学物质时更准确,并且对于从大量化学物质中快速筛选潜在内分泌干扰物可能特别有用。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验