Mervin Lewis H, Afzal Avid M, Brive Lars, Engkvist Ola, Bender Andreas
Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Cambridge, United Kingdom.
Cygnal Bioscience, Pixbo, Sweden.
Front Pharmacol. 2018 Jun 11;9:613. doi: 10.3389/fphar.2018.00613. eCollection 2018.
protein target deconvolution is frequently used for mechanism-of-action investigations; however existing protocols usually do not predict compound functional effects, such as activation or inhibition, upon binding to their protein counterparts. This study is hence concerned with including functional effects in target prediction. To this end, we assimilated a bioactivity training set for 332 targets, comprising 817,239 active data points with unknown functional effect (binding data) and 20,761,260 inactive compounds, along with 226,045 activating and 1,032,439 inhibiting data points from functional screens. Chemical space analysis of the data first showed some separation between compound sets (binding and inhibiting compounds were more similar to each other than both binding and activating or activating and inhibiting compounds), providing a rationale for implementing functional prediction models. We employed three different architectures to predict functional response, ranging from simplistic random forest models ('Arch1') to cascaded models which use separate binding and functional effect classification steps ('Arch2' and 'Arch3'), differing in the way training sets were generated. Fivefold stratified cross-validation outlined cascading predictions provides superior precision and recall based on an internal test set. We next prospectively validated the architectures using a temporal set of 153,467 of in-house data points (after a 4-month interim from initial data extraction). Results outlined Arch3 performed with the highest target class averaged precision and recall scores of 71% and 53%, which we attribute to the use of inactive background sets. Distance-based applicability domain (AD) analysis outlined that Arch3 provides superior extrapolation into novel areas of chemical space, and thus based on the results presented here, propose as the most suitable architecture for the functional effect prediction of small molecules. We finally conclude including functional effects could provide vital insight in future studies, to annotate cases of unanticipated functional changeover, as outlined by our CHRM1 case study.
蛋白质靶点反卷积常用于作用机制研究;然而,现有方案通常无法预测化合物与蛋白质对应物结合后的功能效应,如激活或抑制。因此,本研究关注在靶点预测中纳入功能效应。为此,我们整合了一个针对332个靶点的生物活性训练集,其中包括817239个功能效应未知的活性数据点(结合数据)和20761260个非活性化合物,以及来自功能筛选的226045个激活数据点和1032439个抑制数据点。对数据的化学空间分析首先表明,化合物集之间存在一定的分离(结合化合物和抑制化合物彼此之间比结合和激活化合物或激活和抑制化合物更相似),这为实施功能预测模型提供了理论依据。我们采用了三种不同的架构来预测功能反应,从简单的随机森林模型(“架构1”)到使用单独的结合和功能效应分类步骤的级联模型(“架构2”和“架构3”),它们在训练集的生成方式上有所不同。五重分层交叉验证表明,基于内部测试集,级联预测具有更高的精度和召回率。接下来,我们使用一组153467个内部数据点(从初始数据提取后经过4个月的间隔期)对这些架构进行了前瞻性验证。结果表明,架构3的目标类别平均精度和召回率最高,分别为71%和53%,我们将其归因于使用了非活性背景集。基于距离的适用域(AD)分析表明,架构3能够更好地外推到化学空间的新区域,因此,根据此处给出的结果,建议将其作为小分子功能效应预测的最合适架构。我们最终得出结论,纳入功能效应可以为未来的研究提供至关重要的见解,以注释意外功能转换的情况,如我们的CHRM1案例研究所概述的那样。