Computer-Aided Drug Design/Therapeutic Modalities, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstrasse 124, 4070 Basel, Switzerland.
Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093 Zürich, Switzerland.
J Chem Inf Model. 2020 Jun 22;60(6):2903-2914. doi: 10.1021/acs.jcim.0c00269. Epub 2020 May 5.
Generation and prioritization of new molecules are the most central part of the drug design process. Matched molecular series analysis (MMSA) has recently been proposed as a formal approach that captures both of these key elements of design. In order to better understand the power of MMSA and its specific limitations, we here evaluate its performance as an ADME property prediction tool. We use four large and diverse inhouse data sets, log, microsomal clearance, CYP2C9, and CYP3A4 inhibition. MMSA follows the concept of parallel structure-activity relationship (SAR), where if two identical substituent series on different scaffolds show similarity in their property profiles, SAR from one series can be transferred to the other series. We test four different similarity metrics to identify pairs of molecular series where information can be transferred. We find that the best prediction performance is achieved by a combination of centered root-mean-square deviation (cRMSD) and a network score approach previously published by Keefer et al. However, cRMSD alone strikes the best balance between accuracy and the number of predictions that can be made. We identify statistical metrics that allow estimating when MMSA predictions will work, similar to the well-known applicability domain concept in machine learning. MMSA achieves a prediction accuracy that is comparable to a standard machine-learning model and matched molecular pair analysis. In contrast to machine learning, however, it is very easy to understand where MMSA predictions are coming from. Finally, to prospectively test the power of MMSA, we retested compounds that were strong outliers in the initial predictions and show how the MMSA model can help to identify erroneous data points.
新分子的生成和优先级是药物设计过程中最核心的部分。最近提出了匹配分子系列分析(MMSA)作为一种正式的方法,可以捕捉到设计的这两个关键要素。为了更好地理解 MMSA 的能力及其特定的局限性,我们在此评估它作为 ADME 性质预测工具的性能。我们使用了四个大型的、不同的内部数据集,即 log、微粒体清除率、CYP2C9 和 CYP3A4 抑制作用。MMSA 遵循平行结构-活性关系(SAR)的概念,即如果两个不同支架上的相同取代基系列在其性质分布上表现出相似性,那么一个系列的 SAR 可以转移到另一个系列。我们测试了四种不同的相似性度量方法来识别可以传递信息的分子系列对。我们发现,通过组合中心均方根偏差(cRMSD)和 Keefer 等人之前发表的网络得分方法,可以实现最佳的预测性能。然而,cRMSD 本身在准确性和可进行预测的数量之间取得了最佳平衡。我们确定了统计指标,允许估计何时可以进行 MMSA 预测,类似于机器学习中众所周知的适用性域概念。MMSA 实现了与标准机器学习模型和匹配分子对分析相当的预测准确性。然而,与机器学习不同的是,它非常容易理解 MMSA 预测的来源。最后,为了前瞻性地测试 MMSA 的能力,我们重新测试了在初始预测中是强离群值的化合物,并展示了 MMSA 模型如何帮助识别错误的数据点。