Xue Y, Li H, Ung C Y, Yap C W, Chen Y Z
Bioinformatics and Drug Design Group, Departments of Pharmacy and Computational Science, National University of Singapore, Blk S16, Level 8, 3 Science Drive 2, Singapore 117543.
Chem Res Toxicol. 2006 Aug;19(8):1030-9. doi: 10.1021/tx0600550.
Toxicity of various compounds has been measured in many studies by their toxic effects against Tetrahymena pyriformis. Efforts have also been made to use computational quantitative structure-activity relationship (QSAR) and statistical learning methods (SLMs) for predicting Tetrahymena pyriformis toxicity (TPT) at impressive accuracies. Because of the diversity of compounds and toxicity mechanisms, it is desirable to explore additional methods and to examine if these methods are applicable to more diverse sets of compounds. We tested several SLMs (logistic regression, C4.5 decision tree, k-nearest neighbor, probabilistic neural network, support vector machines) for their capability in predicting TPT by using 1129 compounds (841 TPT and 288 non-TPT agents) which are more diverse than those in other studies. A feature selection method was used for improving prediction performance and selecting molecular descriptors responsible for distinguishing TPT and non-TPT agents. The prediction accuracies are 86.9% approximately 94.2% for TPT and 71.2% approximately 87.5% for non-TPT agents based on 5-fold cross-validation studies, which are comparable to some of earlier studies despite the use of more diverse sets of compounds. The selected molecular descriptors are consistent with those used in other studies and experimental findings. These suggest that SLMs are useful for predicting TPT potential of diverse sets of compounds and for characterizing the molecular descriptors associated with TPT.
在许多研究中,通过各种化合物对梨形四膜虫的毒性作用来测定其毒性。人们也努力运用计算定量构效关系(QSAR)和统计学习方法(SLM),以令人印象深刻的准确度预测梨形四膜虫毒性(TPT)。由于化合物和毒性机制的多样性,有必要探索其他方法,并检验这些方法是否适用于更多种类的化合物。我们使用了1129种化合物(841种TPT试剂和288种非TPT试剂)测试了几种SLM(逻辑回归、C4.5决策树、k近邻、概率神经网络、支持向量机)预测TPT的能力,这些化合物比其他研究中的化合物种类更多。采用一种特征选择方法来提高预测性能,并选择负责区分TPT和非TPT试剂的分子描述符。基于5折交叉验证研究,TPT的预测准确率约为86.9%至94.2%,非TPT试剂的预测准确率约为71.2%至87.5%,尽管使用了更多种类的化合物,但这些准确率与一些早期研究相当。所选的分子描述符与其他研究和实验结果中使用的描述符一致。这些表明,SLM对于预测不同种类化合物的TPT潜力以及表征与TPT相关的分子描述符是有用的。