Lipiński Piotr F J, Szurmak Przemysław
Department of Neuropeptides, Mossakowski Medical Research Centre Polish Academy of Sciences, 02-106 Warsaw, Poland.
ChemPharmSoft, 01-926 Warsaw, Poland.
Chem Zvesti. 2017;71(11):2217-2232. doi: 10.1007/s11696-017-0215-7. Epub 2017 Jun 5.
A common practice in modern QSAR modelling is to derive models by variable selection methods working on large descriptor pools. As pointed out previously, this is intrinsically burdened with the risk of finding random correlations. Therefore it is desirable to perform tests showing the performance of models built on random data. In this contribution, we introduce a simple and freely available software tool SCRAMBLE'N'GAMBLE that is aimed at facilitating data preparation for -randomization and pseudo-descriptors tests. Then, four close-to-real-world modelling situations are analysed. The tests indicate what the quality of obtained QSAR models is like in comparison to chance models derived from random data. The non-randomness is not the only requirement for a good QSAR model, however, it is a good practice to consider it together with internal statistical parameters and possible physical interpretations of a model.
现代定量构效关系(QSAR)建模中的一个常见做法是通过对大量描述符库进行变量选择方法来推导模型。如前所述,这本质上存在发现随机相关性的风险。因此,进行测试以展示基于随机数据构建的模型的性能是很有必要的。在本论文中,我们介绍了一个简单且免费可用的软件工具“SCRAMBLE'N'GAMBLE”,其目的是便于为随机化和伪描述符测试进行数据准备。然后,分析了四种接近实际情况的建模情形。这些测试表明,与从随机数据导出的机会模型相比,所获得的QSAR模型质量如何。然而,非随机性并不是一个好的QSAR模型的唯一要求,将其与内部统计参数以及模型可能的物理解释一起考虑是一种良好的做法。