F. Hoffmann-La Roche Ltd., Non-Clinical Safety, Basel CH-4070, Switzerland.
Chem Res Toxicol. 2011 Jun 20;24(6):843-54. doi: 10.1021/tx2000398. Epub 2011 May 2.
The predictive power of four commonly used in silico tools for mutagenicity prediction (DEREK, Toxtree, MC4PC, and Leadscope MA) was evaluated in a comparative manner using a large, high-quality data set, comprising both public and proprietary data (F. Hoffmann-La Roche) from 9,681 compounds tested in the Ames assay. Satisfactory performance statistics were observed on public data (accuracy, 66.4-75.4%; sensitivity, 65.2-85.2%; specificity, 53.1-82.9%), whereas a significant deterioration of sensitivity was observed in the Roche data (accuracy, 73.1-85.5%; sensitivity, 17.4-43.4%; specificity, 77.5-93.9%). As a general tendency, expert systems showed higher sensitivity and lower specificity when compared to QSAR-based tools, which displayed the opposite behavior. Possible reasons for the performance differences between the public and Roche data, relating to the experimentally inactive to active compound ratio and the different coverage of chemical space, are thoroughly discussed. Examples of peculiar chemical classes enriched in false negative or false positive predictions are given, and the results of the combined use of the prediction systems are described.
使用一个由 9681 种在 Ames 试验中测试的化合物组成的大型、高质量数据集(包括公共和专有数据[F. Hoffmann-La Roche]),以比较的方式评估了四种常用于致突变性预测的计算机工具(DEREK、Toxtree、MC4PC 和 Leadscope MA)的预测能力。在公共数据上观察到了令人满意的性能统计数据(准确性为 66.4-75.4%;灵敏度为 65.2-85.2%;特异性为 53.1-82.9%),而在 Roche 数据上观察到了灵敏度的显著下降(准确性为 73.1-85.5%;灵敏度为 17.4-43.4%;特异性为 77.5-93.9%)。一般来说,与基于 QSAR 的工具相比,专家系统的灵敏度更高,特异性更低,而后者则表现出相反的行为。深入讨论了导致公共数据和 Roche 数据之间性能差异的原因,这些差异与实验上的非活性到活性化合物的比例和化学空间的不同覆盖范围有关。给出了富含假阴性或假阳性预测的特殊化学类别的例子,并描述了预测系统的联合使用结果。