Izrailev Sergei, Agrafiotis Dimitris K
3-Dimensional Pharmaceuticals, Inc., 8 Clarke Drive, Cranbury, NJ 08512, USA.
J Mol Graph Model. 2004 Mar;22(4):275-84. doi: 10.1016/j.jmgm.2003.10.001.
Feature selection is one of the most commonly used and reliable methods for deriving predictive quantitative structure-activity relationships (QSAR). Many feature selection algorithms are stochastic in nature and often produce different solutions depending on the initialization conditions. Because some features may be highly correlated, models that are based on different sets of descriptors may capture essentially the same information, however, such models are difficult to recognize. Here, we introduce a measure of similarity between QSAR models that captures the correlation between the underlying features. This measure can be used in conjunction with stochastic proximity embedding (SPE) or multi-dimensional scaling (MDS) to create a meaningful visual representation of structure-activity model space and aid in the post-processing and analysis of results of feature selection calculations.
特征选择是推导预测性定量构效关系(QSAR)最常用且可靠的方法之一。许多特征选择算法本质上是随机的,并且常常根据初始化条件产生不同的解决方案。由于某些特征可能高度相关,基于不同描述符集的模型可能捕获基本相同的信息,然而,此类模型很难识别。在此,我们引入一种QSAR模型之间的相似性度量,该度量捕获潜在特征之间的相关性。此度量可与随机近邻嵌入(SPE)或多维缩放(MDS)结合使用,以创建构效模型空间的有意义可视化表示,并有助于特征选择计算结果的后处理和分析。