BioMed X Innovation Center , Im Neuenheimer Feld 515, 69120 Heidelberg, Germany.
Global Computational Chemistry, Merck KGaA , Frankfurter Strasse 250, 64293 Darmstadt, Germany.
J Chem Inf Model. 2017 Dec 26;57(12):3079-3085. doi: 10.1021/acs.jcim.7b00298. Epub 2017 Nov 27.
Matched molecular pair (MMP) analyses are widely used in compound optimization projects to gain insights into structure-activity relationships (SAR). The analysis is traditionally done via statistical methods but can also be employed together with machine learning (ML) approaches to extrapolate to novel compounds. The here introduced MMP/ML method combines a fragment-based MMP implementation with different machine learning methods to obtain automated SAR decomposition and prediction. To test the prediction capabilities and model transferability, two different compound optimization scenarios were designed: (1) "new fragments" which occurs when exploring new fragments for a defined compound series and (2) "new static core and transformations" which resembles for instance the identification of a new compound series. Very good results were achieved by all employed machine learning methods especially for the new fragments case, but overall deep neural network models performed best, allowing reliable predictions also for the new static core and transformations scenario, where comprehensive SAR knowledge of the compound series is missing. Furthermore, we show that models trained on all available data have a higher generalizability compared to models trained on focused series and can extend beyond chemical space covered in the training data. Thus, coupling MMP with deep neural networks provides a promising approach to make high quality predictions on various data sets and in different compound optimization scenarios.
配对分子对 (MMP) 分析广泛应用于化合物优化项目中,以深入了解结构-活性关系 (SAR)。该分析传统上通过统计方法进行,但也可以与机器学习 (ML) 方法结合使用,以推断新化合物。这里介绍的 MMP/ML 方法结合了基于片段的 MMP 实现和不同的机器学习方法,以获得自动 SAR 分解和预测。为了测试预测能力和模型可转移性,设计了两种不同的化合物优化场景:(1)“新片段”,当探索新的片段用于定义的化合物系列时发生;(2)“新的静态核心和转化”,例如识别新的化合物系列。所有使用的机器学习方法都取得了非常好的结果,特别是对于新片段情况,但总体而言,深度神经网络模型表现最好,即使在缺乏化合物系列全面 SAR 知识的新静态核心和转化情况下,也可以进行可靠的预测。此外,我们还表明,与在聚焦系列上训练的模型相比,在所有可用数据上训练的模型具有更高的通用性,可以扩展到训练数据涵盖的化学空间之外。因此,将 MMP 与深度神经网络相结合,为在各种数据集和不同的化合物优化场景中进行高质量预测提供了一种很有前途的方法。