TCS Research (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India.
J Chem Inf Model. 2023 Aug 28;63(16):5066-5076. doi: 10.1021/acs.jcim.3c00689. Epub 2023 Aug 16.
Generative artificial intelligence algorithms have shown to be successful in exploring large chemical spaces and designing novel and diverse molecules. There has been considerable interest in developing predictive models using artificial intelligence for drug-like properties, which can potentially reduce the late-stage attrition of drug candidates or predict the properties of novel AI-designed molecules. Concurrently, it is important to understand the contribution of functional groups toward these properties and modify them to obtain property-optimized lead compounds. As a result, there is an increasing interest in the development of explainable property prediction models. However, current explainable approaches are mostly atom-based, where, often, only a fraction of a fragment is shown to be significant. To address the above challenges, we have developed a novel domain-aware molecular fragmentation approach termed post-processing of BRICS (pBRICS), which can fragment small molecules into their functional groups. Multitask models were developed to predict various properties, including the absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. The fragment importance was explained using the gradient-weighted class activation mapping (Grad-CAM) approach. The method was validated on data sets of experimentally available matched molecular pairs (MMPs). The explanations from the model can be useful for medicinal chemists to identify the fragments responsible for poor drug-like properties and optimize the molecule. The explainability approach was also used to identify the reason behind false positive and false negative MMP predictions. Based on evidence from the existing literature and our analysis, some of these mispredictions were justified. We propose that the quantity, quality, and diversity of the training data will improve the accuracy of property prediction algorithms for novel molecules.
生成式人工智能算法在探索大型化学空间和设计新颖多样的分子方面已被证明是成功的。利用人工智能开发用于药物相似性特性的预测模型引起了相当大的兴趣,这有可能减少候选药物的后期淘汰率或预测新型人工智能设计分子的特性。同时,了解官能团对这些特性的贡献并对其进行修改以获得性能优化的先导化合物是很重要的。因此,人们越来越感兴趣开发可解释的性能预测模型。然而,当前的可解释方法主要是基于原子的,其中通常只有一小部分片段被证明是重要的。为了解决上述挑战,我们开发了一种新的基于领域的分子碎片化方法,称为 BRICS 的后处理(pBRICS),它可以将小分子分解成其官能团。开发了多任务模型来预测各种特性,包括吸收、分布、代谢、排泄和毒性(ADMET)特性。使用梯度加权类激活映射(Grad-CAM)方法解释了片段的重要性。该方法在实验可得的匹配分子对(MMP)数据集上进行了验证。该模型的解释对于药物化学家识别导致不良药物相似性特性的片段并优化分子非常有用。可解释性方法还用于识别 MMP 预测假阳性和假阴性的原因。根据现有文献和我们的分析证据,其中一些错误预测是合理的。我们提出,训练数据的数量、质量和多样性将提高新型分子的属性预测算法的准确性。