Litsa Eleni E, Das Payel, Kavraki Lydia E
Department of Computer Science, Rice University Houston TX USA
IBM Research AI, IBM Thomas J. Watson Research Center Yorktown Heights NY 10598 USA
Chem Sci. 2020 Sep 24;11(47):12777-12788. doi: 10.1039/d0sc02639e.
Metabolic processes in the human body can alter the structure of a drug affecting its efficacy and safety. As a result, the investigation of the metabolic fate of a candidate drug is an essential part of drug design studies. Computational approaches have been developed for the prediction of possible drug metabolites in an effort to assist the traditional and resource-demanding experimental route. Current methodologies are based upon metabolic transformation rules, which are tied to specific enzyme families and therefore lack generalization, and additionally may involve manual work from experts limiting scalability. We present a rule-free, end-to-end learning-based method for predicting possible human metabolites of small molecules including drugs. The metabolite prediction task is approached as a sequence translation problem with chemical compounds represented using the SMILES notation. We perform transfer learning on a deep learning transformer model for sequence translation, originally trained on chemical reaction data, to predict the outcome of human metabolic reactions. We further build an ensemble model to account for multiple and diverse metabolites. Extensive evaluation reveals that the proposed method generalizes well to different enzyme families, as it can correctly predict metabolites through phase I and phase II drug metabolism as well as other enzymes. Compared to existing rule-based approaches, our method has equivalent performance on the major enzyme families while it additionally finds metabolites through less common enzymes. Our results indicate that the proposed approach can provide a comprehensive study of drug metabolism that does not restrict to the major enzyme families and does not require the extraction of transformation rules.
人体中的代谢过程会改变药物的结构,影响其疗效和安全性。因此,研究候选药物的代谢命运是药物设计研究的重要组成部分。为了辅助传统且资源需求大的实验途径,已开发出计算方法来预测可能的药物代谢物。当前的方法基于代谢转化规则,这些规则与特定的酶家族相关联,因此缺乏通用性,此外可能还需要专家进行人工操作,限制了可扩展性。我们提出了一种基于端到端学习的无规则方法,用于预测包括药物在内的小分子的可能人体代谢物。代谢物预测任务被视为一个序列翻译问题,使用SMILES符号表示化学化合物。我们在一个最初针对化学反应数据训练的用于序列翻译的深度学习变压器模型上进行迁移学习,以预测人体代谢反应的结果。我们进一步构建了一个集成模型来考虑多种不同的代谢物。广泛的评估表明,所提出的方法对不同的酶家族具有良好的通用性,因为它可以正确预测通过I期和II期药物代谢以及其他酶产生的代谢物。与现有的基于规则的方法相比,我们的方法在主要酶家族上具有同等性能,同时还能通过不太常见的酶找到代谢物。我们的结果表明,所提出的方法可以提供对药物代谢的全面研究,不局限于主要酶家族,也不需要提取转化规则。