Hughes Tyler B, Dang Na Le, Kumar Ayush, Flynn Noah R, Swamidass S Joshua
Department of Pathology and Immunology, Washington University School of Medicine, Campus Box 8118, 660 South Euclid Avenue, St. Louis, Missouri 63110, United States.
J Chem Inf Model. 2020 Oct 26;60(10):4702-4716. doi: 10.1021/acs.jcim.0c00360. Epub 2020 Sep 16.
Adverse drug metabolism often severely impacts patient morbidity and mortality. Unfortunately, drug metabolism experimental assays are costly, inefficient, and slow. Instead, computational modeling could rapidly flag potentially toxic molecules across thousands of candidates in the early stages of drug development. Most metabolism models focus on predicting sites of metabolism (SOMs): the specific substrate atoms targeted by metabolic enzymes. However, SOMs are merely a proxy for metabolic structures: knowledge of an SOM does not explicitly provide the actual metabolite structure. Without an explicit metabolite structure, computational systems cannot evaluate the new molecule's properties. For example, the metabolite's reactivity cannot be automatically predicted, a crucial limitation because reactive drug metabolites are a key driver of adverse drug reactions (ADRs). Additionally, further metabolic events cannot be forecast, even though the metabolic path of the majority of substrates includes two or more sequential steps. To overcome the myopia of the SOM paradigm, this study constructs a well-defined system-termed the metabolic forest-for generating exact metabolite structures. We validate the metabolic forest with the substrate and product structures from a large, chemically diverse, literature-derived dataset of 20 736 records. The metabolic forest finds a pathway linking each substrate and product for 79.42% of these records. By performing a breadth-first search of depth two or three, we improve performance to 88.43 and 88.77%, respectively. The metabolic forest includes a specialized algorithm for producing accurate quinone structures, the most common type of reactive metabolite. To our knowledge, this quinone structure algorithm is the first of its kind, as the diverse mechanisms of quinone formation are difficult to systematically reproduce. We validate the metabolic forest on a previously published dataset of 576 quinone reactions, predicting their structures with a depth three performance of 91.84%. The metabolic forest accurately enumerates metabolite structures, enabling promising new directions such as joint metabolism and reactivity modeling.
药物代谢不良反应常常严重影响患者的发病率和死亡率。不幸的是,药物代谢实验测定成本高昂、效率低下且速度缓慢。相比之下,计算建模可以在药物开发的早期阶段快速筛选出数千种候选药物中潜在的有毒分子。大多数代谢模型专注于预测代谢位点(SOMs):代谢酶作用的特定底物原子。然而,SOMs仅仅是代谢结构的一个替代指标:了解一个SOM并不能明确提供实际的代谢物结构。没有明确的代谢物结构,计算系统就无法评估新分子的性质。例如,无法自动预测代谢物的反应性,这是一个关键限制,因为具有反应性的药物代谢物是药物不良反应(ADR)的一个关键驱动因素。此外,即使大多数底物的代谢途径包括两个或更多连续步骤,也无法预测进一步的代谢事件。为了克服SOM范式的局限性,本研究构建了一个定义明确的系统——代谢森林——用于生成精确的代谢物结构。我们使用来自一个包含20736条记录的大型、化学性质多样、源自文献的数据集的底物和产物结构对代谢森林进行了验证。代谢森林为这些记录中的79.42%找到了连接每个底物和产物的途径。通过进行深度为二或三的广度优先搜索,我们分别将性能提高到了88.43%和88.77%。代谢森林包括一种专门用于生成准确醌结构的算法,醌结构是最常见的反应性代谢物类型。据我们所知,这种醌结构算法是同类算法中的首个,因为醌形成的多种机制难以系统地再现。我们在一个先前发表的包含576个醌反应的数据集上对代谢森林进行了验证,其深度为三时预测结构的性能为91.84%。代谢森林准确地列举了代谢物结构,为联合代谢和反应性建模等有前景的新方向提供了可能。