State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing 210093, China.
Molecules. 2012 May 21;17(5):6126-45. doi: 10.3390/molecules17056126.
A large number of descriptors were employed to characterize the molecular structure of 53 natural, synthetic, and environmental chemicals which are suspected of disrupting endocrine functions by mimicking or antagonizing natural hormones and may thus pose a serious threat to the health of humans and wildlife. In this work, a robust quantitative structure-activity relationship (QSAR) model with a novel variable selection method has been proposed for the effective estrogens. The variable selection method is based on variable interaction (VSMVI) with leave-multiple-out cross validation (LMOCV) to select the best subset. During variable selection, model construction and assessment, the Organization for Economic Co-operation and Development (OECD) principles for regulation of QSAR acceptability were fully considered, such as using an unambiguous multiple-linear regression (MLR) algorithm to build the model, using several validation methods to assessment the performance of the model, giving the define of applicability domain and analyzing the outliers with the results of molecular docking. The performance of the QSAR model indicates that the VSMVI is an effective, feasible and practical tool for rapid screening of the best subset from large molecular descriptors.
大量描述符被用于描述 53 种天然、合成和环境化学品的分子结构,这些化学品通过模拟或拮抗天然激素而被怀疑扰乱内分泌功能,从而对人类和野生动物的健康构成严重威胁。在这项工作中,提出了一种基于新型变量交互选择方法(VSMVI)的鲁棒定量构效关系(QSAR)模型,用于有效雌激素。该变量选择方法基于变量交互(VSMVI)和留一多重交叉验证(LMOCV)来选择最佳子集。在变量选择、模型构建和评估过程中,充分考虑了经济合作与发展组织(OECD)对 QSAR 可接受性的监管原则,例如使用明确的多元线性回归(MLR)算法来构建模型,使用多种验证方法来评估模型的性能,定义适用性域,并根据分子对接的结果分析离群值。QSAR 模型的性能表明,VSMVI 是一种有效、可行和实用的工具,可用于从大量分子描述符中快速筛选最佳子集。