Suppr超能文献

利用机器学习方法评估化合物的综合可及性。

Assessing synthetic accessibility of chemical compounds using machine learning methods.

机构信息

Department of Computer Science and Computer Engineering, University of Minnesota, Minneapolis, Minnesota 55455, USA.

出版信息

J Chem Inf Model. 2010 Jun 28;50(6):979-91. doi: 10.1021/ci900301v.

Abstract

With de novo rational drug design, scientists can rapidly generate a very large number of potentially biologically active probes. However, many of them may be synthetically infeasible and, therefore, of limited value to drug developers. On the other hand, most of the tools for synthetic accessibility evaluation are very slow and can process only a few molecules per minute. In this study, we present two approaches to quickly predict the synthetic accessibility of chemical compounds by utilizing support vector machines operating on molecular descriptors. The first approach, RSsvm, is designed to identify the compounds that can be synthesized using a specific set of reactions and starting materials and builds its model by training on the compounds identified as synthetically accessible or not by retrosynthetic analysis. The second approach, DRsvm, is designed to provide a more general assessment of synthetic accessibility that is not tied to any set of reactions or starting materials. The training set compounds for this approach are selected from a diverse library based on the number of other similar compounds within the same library. Both approaches have been shown to perform very well in their corresponding areas of applicability with the RSsvm achieving a receiver operator characteristic score of 0.952 in cross-validation experiments and the DRsvm achieving a score of 0.888 on an independent set of compounds. Our implementations can successfully process thousands of compounds per minute.

摘要

通过从头合理药物设计,科学家可以快速生成大量潜在的具有生物活性的探针。然而,其中许多探针可能在合成上不可行,因此对药物开发者的价值有限。另一方面,大多数用于合成可及性评估的工具都非常缓慢,每分钟只能处理几个分子。在这项研究中,我们提出了两种利用支持向量机(SVM)操作分子描述符快速预测化合物合成可及性的方法。第一种方法 RSsvm 旨在识别可以使用特定反应集和起始原料合成的化合物,并通过对通过反合成分析确定为可合成或不可合成的化合物进行训练来构建其模型。第二种方法 DRsvm 旨在提供一种更通用的合成可及性评估,而不依赖于任何反应集或起始原料。该方法的训练集化合物是从基于同一库中其他类似化合物数量的多样化库中选择的。两种方法在其相应的应用领域都表现得非常出色,RSsvm 在交叉验证实验中获得了 0.952 的接收者操作特征(ROC)评分,DRsvm 在独立化合物集上获得了 0.888 的评分。我们的实现可以成功地每分钟处理数千个化合物。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验