Sciome LLC, Research Triangle Park, North Carolina 27709, United States.
National Institute of Environmental Health Sciences (NIEHS), National Toxicology Program (NTP), Research Triangle Park, North Carolina 27709, United States.
Chem Res Toxicol. 2021 Feb 15;34(2):634-640. doi: 10.1021/acs.chemrestox.0c00464. Epub 2020 Dec 25.
Molecular structure-based predictive models provide a proven alternative to costly and inefficient animal testing. However, due to a lack of interpretability of predictive models built with abstract molecular descriptors they have earned the notoriety of being black boxes. Interpretable models require interpretable descriptors to provide chemistry-backed predictive reasoning and facilitate intelligent molecular design. We developed a novel set of extensible chemistry-aware substructures, , to support interpretable predictive models and read-across protocols. Performance of in chemical characterization and search for structurally similar actives for read-across applications was compared with four publicly available fingerprint sets (MACCS (166), PubChem (881), ECFP4 (1024), ToxPrint (729)) in three benchmark sets (MUV, ULS, and Tox21) spanning ∼145 000 compounds and 78 molecular targets at 1%, 2%, 5%, and 10% false discovery rates. In 18 of the 20 comparisons, interpretable features performed better than the publicly available, but less interpretable and fixed-bit length, fingerprints. Examples are provided to show the enhanced capability of in extracting compounds with higher scaffold similarity. features are interpretable and efficiently characterize diverse chemical collections, thus making them a better choice for building interpretable predictive models and read-across protocols.
基于分子结构的预测模型为代价高昂且效率低下的动物测试提供了一种经过验证的替代方法。然而,由于用抽象分子描述符构建的预测模型缺乏可解释性,它们被认为是“黑箱”。可解释的模型需要可解释的描述符来提供有化学依据的预测推理,并促进智能分子设计。我们开发了一套新的可扩展的化学感知子结构 ,以支持可解释的预测模型和读通协议。在三个基准集(MUV、ULS 和 Tox21)中,在 1%、2%、5%和 10%的假发现率下,比较了 在化学特征描述和寻找结构相似的活性物质以进行读通应用方面的性能,与四个公开可用的指纹集(MACCS(166)、PubChem(881)、ECFP4(1024)、ToxPrint(729))进行了比较,涵盖了约 145000 种化合物和 78 个分子靶标。在 20 次比较中的 18 次中,可解释的 特征比公开的、但可解释性较低且固定位长的指纹表现更好。提供了示例来说明 特征在提取具有更高支架相似性的化合物方面的增强能力。 特征是可解释的,可以有效地描述多样化的化学集合,因此,它们是构建可解释的预测模型和读通协议的更好选择。