Sheridan Robert P, Nam Kiyean, Maiorov Vladimir N, McMasters Daniel R, Cornell Wendy D
Chemistry Modeling and Informatics Department, Merck Research Laboratories, RY50SW-100, Rahway, NJ 07065, USA.
J Chem Inf Model. 2009 Aug;49(8):1974-85. doi: 10.1021/ci900176y.
We propose a direct QSAR methodology to predict how similar the inhibitor-binding profiles of two protein kinases are likely to be, based on the properties of the residues surrounding the ATP-binding site. We produce a random forest model for each of five data sets (one in-house, four from the literature) where multiple compounds are tested on many kinases. Each model is self-consistent by cross-validation, and all models point to only a few residues in the active site controlling the binding profiles. While all models include the "gatekeeper" as one of the important residues, consistent with previous literature, some models suggest other residues as being more important. We apply each model to predict the similarity in binding profile to all pairs in a set of 411 kinases from the human genome and get very different predictions from each model. This turns out not to be an issue with model-building but with the fact that the experimental data sets disagree about which kinases are similar to which others. It is possible to build a model combining all the data from the five data sets that is reasonably self-consistent but not surprisingly, given the disagreement between data sets, less self-consistent than the individual models.
我们提出了一种直接的定量构效关系(QSAR)方法,基于ATP结合位点周围残基的性质,预测两种蛋白激酶的抑制剂结合谱可能有多相似。我们为五个数据集(一个内部数据集,四个来自文献)中的每一个生成了一个随机森林模型,在这些数据集中,多种化合物在许多激酶上进行了测试。每个模型通过交叉验证都是自洽的,并且所有模型都指出活性位点中只有少数残基控制着结合谱。虽然所有模型都将“守门人”作为重要残基之一,这与先前的文献一致,但一些模型表明其他残基更为重要。我们应用每个模型来预测与人类基因组中411种激酶中的所有激酶对的结合谱相似性,并且从每个模型得到非常不同的预测结果。事实证明,这不是模型构建的问题,而是实验数据集在哪些激酶彼此相似的问题上存在分歧。有可能构建一个结合五个数据集所有数据的模型,该模型相当自洽,但鉴于数据集之间的分歧,毫不奇怪,它不如单个模型那么自洽。