Lapins Maris, Arvidsson Staffan, Lampa Samuel, Berg Arvid, Schaal Wesley, Alvarsson Jonathan, Spjuth Ola
Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24, Uppsala, Sweden.
J Cheminform. 2018 Apr 3;10(1):17. doi: 10.1186/s13321-018-0271-1.
Lipophilicity is a major determinant of ADMET properties and overall suitability of drug candidates. We have developed large-scale models to predict water-octanol distribution coefficient (logD) for chemical compounds, aiding drug discovery projects. Using ACD/logD data for 1.6 million compounds from the ChEMBL database, models are created and evaluated by a support-vector machine with a linear kernel using conformal prediction methodology, outputting prediction intervals at a specified confidence level. The resulting model shows a predictive ability of [Formula: see text] and with the best performing nonconformity measure having median prediction interval of [Formula: see text] log units at 80% confidence and [Formula: see text] log units at 90% confidence. The model is available as an online service via an OpenAPI interface, a web page with a molecular editor, and we also publish predictive values at 90% confidence level for 91 M PubChem structures in RDF format for download and as an URI resolver service.
亲脂性是药物代谢动力学(ADMET)性质以及候选药物整体适用性的主要决定因素。我们开发了大规模模型来预测化合物的水-辛醇分配系数(logD),以辅助药物发现项目。利用来自ChEMBL数据库的160万种化合物的ACD/logD数据,使用具有线性核的支持向量机并采用共形预测方法创建和评估模型,在指定置信水平下输出预测区间。所得模型的预测能力为[公式:见原文],并且性能最佳的不一致性度量在80%置信度下的预测区间中位数为[公式:见原文]对数单位,在90%置信度下为[公式:见原文]对数单位。该模型可通过OpenAPI接口作为在线服务获取,也可通过带有分子编辑器的网页获取,我们还以RDF格式发布9100万个PubChem结构在90%置信水平下的预测值以供下载,并提供URI解析服务。