Toivonen Hannu, Srinivasan Ashwin, King Ross D, Kramer Stefan, Helma Christoph
Department of Computer Science, PO Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland.
Bioinformatics. 2003 Jul 1;19(10):1183-93. doi: 10.1093/bioinformatics/btg130.
The development of in silico models to predict chemical carcinogenesis from molecular structure would help greatly to prevent environmentally caused cancers. The Predictive Toxicology Challenge (PTC) competition was organized to test the state-of-the-art in applying machine learning to form such predictive models.
Fourteen machine learning groups generated 111 models. The use of Receiver Operating Characteristic (ROC) space allowed the models to be uniformly compared regardless of the error cost function. We developed a statistical method to test if a model performs significantly better than random in ROC space. Using this test as criteria five models performed better than random guessing at a significance level p of 0.05 (not corrected for multiple testing). Statistically the best predictor was the Viniti model for female mice, with p value below 0.002. The toxicologically most interesting models were Leuven2 for male mice, and Kwansei for female rats. These models performed well in the statistical analysis and they are in the middle of ROC space, i.e. distant from extreme cost assumptions. These predictive models were also independently judged by domain experts to be among the three most interesting, and are believed to include a small but significant amount of empirically learned toxicological knowledge.
PTC details and data can be found at: http://www.predictive-toxicology.org/ptc/.
开发计算机模型以从分子结构预测化学致癌作用将极大地有助于预防环境引发的癌症。组织预测毒理学挑战赛(PTC)是为了测试在应用机器学习形成此类预测模型方面的最新技术水平。
14个机器学习团队生成了111个模型。使用受试者工作特征(ROC)空间可以对模型进行统一比较,而无需考虑误差成本函数。我们开发了一种统计方法来测试模型在ROC空间中的表现是否显著优于随机猜测。以该测试为标准,在显著性水平p为0.05(未针对多重检验进行校正)时,有5个模型的表现优于随机猜测。从统计学角度来看,最佳预测模型是针对雌性小鼠的Viniti模型,其p值低于0.002。从毒理学角度来看,最有趣的模型是针对雄性小鼠的鲁汶2模型和针对雌性大鼠的关西模型。这些模型在统计分析中表现良好,且处于ROC空间的中间位置,即远离极端成本假设。这些预测模型还经过领域专家独立评判,被认为是最有趣的三个模型之一,并且被认为包含少量但重要的经验性毒理学知识。
PTC的详细信息和数据可在以下网址找到:http://www.predictive-toxicology.org/ptc/。