Manchester John, Czermiński Ryszard
AstraZeneca Pharmaceuticals R&D Boston, Waltham, Massachusetts 02451, USA.
J Chem Inf Model. 2008 Jun;48(6):1167-73. doi: 10.1021/ci800009u. Epub 2008 May 27.
In this paper we consider the following question: How much can we simplify molecular description without sacrificing too much quality of 3D-QSAR models. We compare the performance of the newly developed Simple Atom Mapping Following Alignment (SAMFA) descriptors with CoMFA using nine different data sets from the literature, by using three regression approaches (PLS, SVM, RandomForest), as implemented in R, and Monte Carlo cross-validation (MCCV) numerical experiments. The results indicate that SAMFA descriptors, despite their simplicity, perform surprisingly well when compared to the much more refined CoMFA descriptors. Moreover, their simplicity makes them readily interpretable and applicable to the difficult problem of inverse QSAR.
在本文中,我们考虑以下问题:在不牺牲太多3D-QSAR模型质量的前提下,我们能将分子描述简化到何种程度。我们使用文献中的九个不同数据集,通过R语言中实现的三种回归方法(PLS、SVM、随机森林)以及蒙特卡罗交叉验证(MCCV)数值实验,将新开发的对齐后简单原子映射(SAMFA)描述符与比较分子场分析(CoMFA)的性能进行了比较。结果表明,尽管SAMFA描述符很简单,但与更为精细的CoMFA描述符相比,其表现出人意料地好。此外,它们的简单性使其易于解释,并适用于逆定量构效关系这一难题。