Jacob Roxane Axel, Mazzolari Angelica, Kirchmair Johannes
Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria.
Christian Doppler Laboratory for Molecular Informatics in the Biosciences, Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria.
J Chem Inf Model. 2025 Jul 14;65(13):7065-7080. doi: 10.1021/acs.jcim.5c00819. Epub 2025 Jun 17.
Computational models predicting the Sites-of-Metabolism (SOMs) of small organic molecules have become invaluable tools for studying and optimizing the metabolic properties of xenobiotics. However, the performance of SOM predictors has shown signs of plateauing in recent years, primarily due to the limited availability of training data. While vast amounts of biotransformation data in the form of substrate-metabolite pairs exist, their potential for SOM prediction remains largely untapped due to the absence of annotations. Annotating SOMs requires expert knowledge and is a highly time-consuming process. To address this challenge, we introduce AutoSOM, the first open-source tool that automatically extracts SOMs by mapping structural differences using transformation rules. AutoSOM is both fast and highly accurate, achieving over 90% labeling accuracy on a diverse validation set of more than 5,000 reactions within minutes. Moreover, its annotation process is fully transparent and interpretable, which we hope will facilitate its adoption in high-stakes downstream applications such as drug discovery campaigns and regulatory assessments. Beyond accelerating annotation, AutoSOM enables standardized and consistent SOM labeling across institutions without requiring direct data sharing.
预测小分子代谢位点(SOMs)的计算模型已成为研究和优化外源性物质代谢特性的重要工具。然而,近年来SOM预测器的性能已显示出趋于平稳的迹象,主要原因是训练数据有限。虽然存在大量以底物-代谢物对形式存在的生物转化数据,但由于缺乏注释,它们在SOM预测方面的潜力仍未得到充分挖掘。注释SOM需要专业知识,且是一个非常耗时的过程。为应对这一挑战,我们推出了AutoSOM,这是首个通过使用转化规则映射结构差异来自动提取SOM的开源工具。AutoSOM既快速又高度准确,在包含5000多个反应的多样化验证集上,几分钟内就能实现超过90%的标记准确率。此外,其注释过程完全透明且可解释,我们希望这将有助于其在药物发现活动和监管评估等高风险下游应用中得到采用。除了加速注释外,AutoSOM还能在无需直接数据共享的情况下,实现跨机构的标准化和一致的SOM标记。