Lead Discovery Informatics, Lead Finding Platform, Novartis Institutes for BioMedical Research, Inc., 250 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA.
Protein Sci. 2010 Nov;19(11):2096-109. doi: 10.1002/pro.490.
We present here a comprehensive analysis of proteases in the peptide substrate space and demonstrate its applicability for lead discovery. Aligned octapeptide substrates of 498 proteases taken from the MEROPS peptidase database were used for the in silico analysis. A multiple-category naïve Bayes model, trained on the two-dimensional chemical features of the substrates, was able to classify the substrates of 365 (73%) proteases and elucidate statistically significant chemical features for each of their specific substrate positions. The positional awareness of the method allows us to identify the most similar substrate positions between proteases. Our analysis reveals that proteases from different families, based on the traditional classification (aspartic, cysteine, serine, and metallo), could have substrates that differ at the cleavage site (P1-P1') but are similar away from it. Caspase-3 (cysteine protease) and granzyme B (serine protease) are previously known examples of cross-family neighbors identified by this method. To assess whether peptide substrate similarity between unrelated proteases could reliably translate into the discovery of low molecular weight synthetic inhibitors, a lead discovery strategy was tested on two other cross-family neighbors--namely cathepsin L2 and matrix metallo proteinase 9, and calpain 1 and pepsin A. For both these pairs, a naïve Bayes classifier model trained on inhibitors of one protease could successfully enrich those of its neighbor from a different family and vice versa, indicating that this approach could be prospectively applied to lead discovery for a novel protease target with no known synthetic inhibitors.
我们在这里对肽底物空间中的蛋白酶进行了全面分析,并展示了其在发现先导物方面的应用。使用来自 MEROPS 肽酶数据库的 498 种蛋白酶的对齐八肽底物进行了计算机分析。基于底物的二维化学特征训练的多类别朴素贝叶斯模型能够对 365 种(73%)蛋白酶的底物进行分类,并阐明其每个特定底物位置的统计学上显著的化学特征。该方法的位置意识使我们能够识别不同蛋白酶之间最相似的底物位置。我们的分析表明,基于传统分类(天冬氨酸、半胱氨酸、丝氨酸和金属)的不同家族的蛋白酶,其切割位点(P1-P1')可能不同,但远离它的位置可能相似。半胱氨酸蛋白酶 caspase-3 和丝氨酸蛋白酶 granzyme B 是通过这种方法以前确定的跨家族邻居的已知例子。为了评估 unrelated 蛋白酶之间的肽底物相似性是否可以可靠地转化为发现低分子量合成抑制剂,我们在另外两个跨家族邻居——组织蛋白酶 L2 和基质金属蛋白酶 9 以及钙蛋白酶 1 和胃蛋白酶 A 上测试了一种发现先导物的策略。对于这两对蛋白酶,使用一种蛋白酶的抑制剂训练的朴素贝叶斯分类器模型可以成功地从不同家族的邻居中富集其抑制剂,反之亦然,这表明这种方法可以前瞻性地应用于没有已知合成抑制剂的新型蛋白酶靶标发现先导物。