Suppr超能文献

迈向PC-ANN模型构建的优化程序:预测大量药物的致癌活性。

Toward an optimal procedure for PC-ANN model building: prediction of the carcinogenic activity of a large set of drugs.

作者信息

Hemmateenejad Bahram, Safarpour Mohammad A, Miri Ramin, Nesari Nasim

机构信息

Medicinal & Natural Products Chemistry Research Center, Shiraz University of Medical Sciences, Shiraz, Iran.

出版信息

J Chem Inf Model. 2005 Jan-Feb;45(1):190-9. doi: 10.1021/ci049766z.

Abstract

The performances of the three novel QSAR algorithms, principal component-artificial neural network modeling method combining with three factor selection procedures named eigenvalue ranking, correlation ranking, and genetic algorithm (ER-PC-ANN, CR-PC-ANN, PC-GA-ANN, respectively), are compared by application of these model to the prediction of the carcinogenic activity of a large set of drugs (735 drugs) belonging to a diverse type of compounds. A total number of 1350 theoretical descriptors are calculated for each molecule. The matrix of calculated descriptors (with 735 x 1350 dimension) is subjected to PCA. 95% of the variances in the matrix are explained by the first 137 principal components (PC's). From the pool of 137 PC's, the factor selection methods (ER, CR, and GA) are employed to select the best set of PC's for PC-ANN modeling. In the ER-PC-ANN, the PC's are successively entered into the ANN based on their decreasing eigenvalue. In the CR-PC-ANN, the ANN is first employed to model the nonlinear relationship between each one of the PC's and the carcinogen activity separately. Then, the PC's are ranked based on their decreasing correlating ability and entered to the input layer of the network one after another. Finally, a search algorithm (i.e. genetic algorithm) is used to find the best set of PC's. Both the external and cross-validation methods are used to validate the performances of the resulting models. One is able to see that the results obtained by the PC-GA-ANN and CR-PC-ANN procedures are superior to those resulted from the EV-PC-ANN. Comparison of the results reveals that the results produced by the PC-GA-ANN algorithm are better than those produced by CR-PC-ANN. However, the difference is not significant.

摘要

将三种新型定量构效关系(QSAR)算法,即分别与特征值排序、相关性排序和遗传算法这三种因子选择程序相结合的主成分 - 人工神经网络建模方法(分别为ER - PC - ANN、CR - PC - ANN、PC - GA - ANN),应用于预测一大组属于不同类型化合物的药物(735种药物)的致癌活性,以此比较它们的性能。为每个分子计算了总共1350个理论描述符。对计算得到的描述符矩阵(维度为735×1350)进行主成分分析(PCA)。矩阵中95%的方差由前137个主成分(PC)解释。从这137个主成分中,采用因子选择方法(ER、CR和GA)为PC - ANN建模选择最佳的主成分集。在ER - PC - ANN中,主成分根据其递减的特征值依次输入到人工神经网络中。在CR - PC - ANN中,首先使用人工神经网络分别对每个主成分与致癌活性之间的非线性关系进行建模。然后,根据主成分递减的相关能力对其进行排序,并依次输入到网络的输入层。最后,使用一种搜索算法(即遗传算法)来找到最佳的主成分集。采用外部验证和交叉验证方法来验证所得模型的性能。可以看出,PC - GA - ANN和CR - PC - ANN程序得到的结果优于EV - PC - ANN得到的结果。结果比较表明,PC - GA - ANN算法产生的结果优于CR - PC - ANN产生的结果。然而,差异并不显著。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验