Okada Takashi
Center for Information & Media Studies, Kwansei Gakuin University, 1-1-155 Uegahara, Nishinomiya, 662-8501, Japan.
Bioinformatics. 2003 Jul 1;19(10):1208-15. doi: 10.1093/bioinformatics/btg129.
Chemical carcinogenicity is an important subject in health and environmental sciences, and a reliable method is expected to identify characteristic factors for carcinogenicity. The predictive toxicology challenge (PTC) 2000-2001 has provided the opportunity for various data mining methods to evaluate their performance. The cascade model, a data mining method developed by the author, has the capability to mine for local correlations in data sets with a large number of attributes. The current paper explores the effectiveness of the method on the problem of chemical carcinogenicity.
Rodent carcinogenicity of 417 compounds examined by the National Toxicology Program (NTP) was used as the training set. The analysis by the cascade model, for example, could obtain a rule 'Highly flexible molecules are carcinogenic, if they have no hydrogen bond acceptors in halogenated alkanes and alkenes'. Resulting rules are applied to predict the activity of 185 compounds examined by the FDA. The ROC analysis performed by the PTC organizers has shown that the current method has excellent predictive power for the female rat data.
The binary program of DISCAS 2.1 and samples of input data sets on Windows PC are available at http://www.clab.kwansei.ac.jp/mining/discas/discas.html upon request from the author.
Summary of prediction results and cross validations is accessible via http://www.clab.kwansei.ac.jp/~okada/BIJ/BIJsupple.htm. Used rules and the prediction results for each molecule are also provided.
化学致癌性是健康与环境科学中的一个重要课题,人们期望有可靠的方法来识别致癌性的特征因素。2000 - 2001年的预测毒理学挑战(PTC)为各种数据挖掘方法评估其性能提供了契机。作者开发的数据挖掘方法——级联模型,有能力在具有大量属性的数据集中挖掘局部相关性。本文探讨了该方法在化学致癌性问题上的有效性。
将美国国家毒理学计划(NTP)检测的417种化合物的啮齿动物致癌性用作训练集。例如,通过级联模型分析可得到一条规则:“在卤代烷烃和烯烃中,如果没有氢键受体,高度灵活的分子具有致癌性”。所得规则用于预测美国食品药品监督管理局(FDA)检测的185种化合物的活性。PTC组织者进行的ROC分析表明,当前方法对雌性大鼠数据具有出色的预测能力。
可通过向作者索取,从http://www.clab.kwansei.ac.jp/mining/discas/discas.html获取DISCAS 2.1的二进制程序以及Windows PC上输入数据集的样本。
预测结果和交叉验证的总结可通过http://www.clab.kwansei.ac.jp/~okada/BIJ/BIJsupple.htm获取。还提供了使用的规则和每个分子的预测结果。