State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau 999078, China.
Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an 710049, China.
Int J Mol Sci. 2017 Dec 22;19(1):30. doi: 10.3390/ijms19010030.
The quantitative structure-activity relationship (QSAR) model searches for a reliable relationship between the chemical structure and biological activities in the field of drug design and discovery. (1) Background: In the study of QSAR, the chemical structures of compounds are encoded by a substantial number of descriptors. Some redundant, noisy and irrelevant descriptors result in a side-effect for the QSAR model. Meanwhile, too many descriptors can result in overfitting or low correlation between chemical structure and biological bioactivity. (2) Methods: We use novel log-sum regularization to select quite a few descriptors that are relevant to biological activities. In addition, a coordinate descent algorithm, which uses novel univariate log-sum thresholding for updating the estimated coefficients, has been developed for the QSAR model. (3) Results: Experimental results on artificial and four QSAR datasets demonstrate that our proposed log-sum method has good performance among state-of-the-art methods. (4) Conclusions: Our proposed multiple linear regression with log-sum penalty is an effective technique for both descriptor selection and prediction of biological activity.
定量构效关系 (QSAR) 模型旨在寻找药物设计和发现领域中化学结构与生物活性之间可靠的关系。(1) 背景:在 QSAR 研究中,化合物的化学结构由大量描述符编码。一些冗余、嘈杂和不相关的描述符会对 QSAR 模型产生副作用。同时,过多的描述符会导致过度拟合或化学结构与生物生物活性之间的相关性低。(2) 方法:我们使用新的对数和正则化来选择与生物活性相关的少量描述符。此外,还开发了一种坐标下降算法,该算法使用新的单变量对数和阈值更新估计系数,用于 QSAR 模型。(3) 结果:在人工和四个 QSAR 数据集上的实验结果表明,我们提出的对数和方法在最先进的方法中表现良好。(4) 结论:我们提出的带对数和罚项的多元线性回归是一种用于描述符选择和生物活性预测的有效技术。