Johns Hopkins University, Bloomberg School of Public Health, Center for Alternatives to Animal Testing (CAAT), Baltimore, Maryland 21205.
ToxTrack, Baltimore, Maryland 21209.
Toxicol Sci. 2018 Sep 1;165(1):198-212. doi: 10.1093/toxsci/kfy152.
Earlier we created a chemical hazard database via natural language processing of dossiers submitted to the European Chemical Agency with approximately 10 000 chemicals. We identified repeat OECD guideline tests to establish reproducibility of acute oral and dermal toxicity, eye and skin irritation, mutagenicity and skin sensitization. Based on 350-700+ chemicals each, the probability that an OECD guideline animal test would output the same result in a repeat test was 78%-96% (sensitivity 50%-87%). An expanded database with more than 866 000 chemical properties/hazards was used as training data and to model health hazards and chemical properties. The constructed models automate and extend the read-across method of chemical classification. The novel models called RASARs (read-across structure activity relationship) use binary fingerprints and Jaccard distance to define chemical similarity. A large chemical similarity adjacency matrix is constructed from this similarity metric and is used to derive feature vectors for supervised learning. We show results on 9 health hazards from 2 kinds of RASARs-"Simple" and "Data Fusion". The "Simple" RASAR seeks to duplicate the traditional read-across method, predicting hazard from chemical analogs with known hazard data. The "Data Fusion" RASAR extends this concept by creating large feature vectors from all available property data rather than only the modeled hazard. Simple RASAR models tested in cross-validation achieve 70%-80% balanced accuracies with constraints on tested compounds. Cross validation of data fusion RASARs show balanced accuracies in the 80%-95% range across 9 health hazards with no constraints on tested compounds.
早些时候,我们通过对提交给欧洲化学品管理局的档案进行自然语言处理,创建了一个化学危害数据库,其中包含大约 10000 种化学物质。我们确定了重复的 OECD 指导方针测试,以建立急性口服和皮肤毒性、眼睛和皮肤刺激、致突变性和皮肤致敏性的重现性。基于每个测试约有 350-700+种化学物质,OECD 指导方针动物测试在重复测试中得出相同结果的概率为 78%-96%(灵敏度为 50%-87%)。一个包含超过 866000 种化学性质/危害的扩展数据库被用作训练数据,并用于模拟健康危害和化学性质。构建的模型使化学分类的读取方法自动化和扩展。这些名为 RASARs(读取相似性结构活性关系)的新型模型使用二进制指纹和 Jaccard 距离来定义化学相似性。从这种相似性度量标准构建了一个大型化学相似性邻接矩阵,并用于为监督学习生成特征向量。我们展示了来自两种 RASARs(“简单”和“数据融合”)的 9 种健康危害的结果。“简单”RASAR 试图复制传统的读取方法,根据具有已知危害数据的化学类似物预测危害。“数据融合”RASAR 通过从所有可用的属性数据而不是仅建模的危害中创建大型特征向量来扩展此概念。经过交叉验证的简单 RASAR 模型在测试化合物受限的情况下达到 70%-80%的平衡准确性。数据融合 RASAR 的交叉验证在 9 种健康危害中显示出 80%-95%的平衡准确性,对测试化合物没有限制。