Adams Matthew, Hidle Hannah, Chang Daniel, Richard Ann M, Williams Antony J, Shah Imran, Patlewicz Grace
ORAU, Oak Ridge, TN, 37830, USA.
Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA.
Comput Toxicol. 2023 Feb 1;25. doi: 10.1016/j.comtox.2022.100256.
The Analog Identification Methodology (AIM) was developed over 20 years ago to identify analogues to support read-across at the US Environmental Protection Agency. However, the current public version of the standalone tool, released in 2012, is no longer usable on Windows operating systems supported by Microsoft. Additionally, the structural logic for analogue selection is based on older, customised Simplified molecular-input-line-entry system (SMILES)-type features that are incompatible with modern cheminformatics tools. Given these limitations, a case study was undertaken to explore a more transparent, extensible method of implementing the AIM fragments using Chemical Subgraphs and Reactions Mark-up Language (CSRML). A CSRML file was developed to codify the original AIM fragments, and the extent to which AIM fragments were faithfully replicated was assessed using the AIM Database. The overall mean performance of the CSRML-AIM across all fragments in terms of sensitivity, specificity, and Jaccard similarity was 89.5%, 99.9%, and 82.2%, respectively. Comparing the AIM fragments with public ToxPrints using a large set of ~25,000 substances of regulatory interest to EPA found them to be dissimilar, with an average maximum Jaccard score of 0.24 for AIM and 0.29 for ToxPrint fingerprints. Both fragment sets were then used as inputs in the automated read-across approach, Generalised Read-Across (GenRA), to evaluate the quality of fit in predicting rat acute oral toxicity LD values with the coefficient of determination (R) and root mean squared error (RMSE). The performance of AIM fragments was R=0.434 and RMSE=0.663 whereas that of ToxPrints was R=0.477 and RMSE=0.638. A bootstrap resampling using 100 iterations found the mean and the 95 confidence interval of R to be 0.349 [0.319, 0.379] for AIM fragments and 0.377 [0.338, 0.412] for ToxPrints. Although AIM and ToxPrints performed similarly in predicting LD they differed in their performance at a local level, revealing that their features can offer complementary insights.
模拟识别方法(AIM)是20多年前开发的,用于识别类似物以支持美国环境保护局的跨类别推断。然而,2012年发布的独立工具的当前公共版本在微软支持的Windows操作系统上已无法使用。此外,类似物选择的结构逻辑基于较旧的、定制的简化分子输入线性输入系统(SMILES)类型的特征,这些特征与现代化学信息学工具不兼容。鉴于这些限制,开展了一项案例研究,以探索一种使用化学子图和反应标记语言(CSRML)来实现AIM片段的更透明、可扩展的方法。开发了一个CSRML文件来编纂原始的AIM片段,并使用AIM数据库评估AIM片段被如实复制的程度。就灵敏度、特异性和杰卡德相似度而言,CSRML - AIM在所有片段上的总体平均性能分别为89.5%、99.9%和82.2%。使用大约25000种对美国环境保护局具有监管意义的物质的大集合,将AIM片段与公共毒理学指纹图谱(ToxPrints)进行比较,发现它们不相似,AIM的平均最大杰卡德分数为0.24,ToxPrint指纹图谱为0.29。然后,将这两个片段集用作自动跨类别推断方法广义跨类别推断(GenRA)的输入,以使用决定系数(R)和均方根误差(RMSE)评估预测大鼠急性经口毒性半数致死剂量(LD)值时的拟合质量。AIM片段的性能为R = 0.434,RMSE = 0.663,而ToxPrints的性能为R = 0.477,RMSE = 0.638。使用100次迭代的自助重采样发现,AIM片段的R的均值和95%置信区间为0.349 [0.319, 0.379],ToxPrints的为0.377 [0.338, 0.412]。尽管AIM和ToxPrints在预测LD方面表现相似,但它们在局部水平的性能有所不同,这表明它们的特征可以提供互补的见解。