Institute of General, Inorganic and Theoretical Chemistry, and Center for Molecular Biosciences Innsbruck (CMBI), University of Innsbruck, Innsbruck, Austria.
PLoS Comput Biol. 2013;9(11):e1003353. doi: 10.1371/journal.pcbi.1003353. Epub 2013 Nov 14.
Sequence logos are frequently used to illustrate substrate preferences and specificity of proteases. Here, we employed the compiled substrates of the MEROPS database to introduce a novel metric for comparison of protease substrate preferences. The constructed similarity matrix of 62 proteases can be used to intuitively visualize similarities in protease substrate readout via principal component analysis and construction of protease specificity trees. Since our new metric is solely based on substrate data, we can engraft the protease tree including proteolytic enzymes of different evolutionary origin. Thereby, our analyses confirm pronounced overlaps in substrate recognition not only between proteases closely related on sequence basis but also between proteolytic enzymes of different evolutionary origin and catalytic type. To illustrate the applicability of our approach we analyze the distribution of targets of small molecules from the ChEMBL database in our substrate-based protease specificity trees. We observe a striking clustering of annotated targets in tree branches even though these grouped targets do not necessarily share similarity on protein sequence level. This highlights the value and applicability of knowledge acquired from peptide substrates in drug design of small molecules, e.g., for the prediction of off-target effects or drug repurposing. Consequently, our similarity metric allows to map the degradome and its associated drug target network via comparison of known substrate peptides. The substrate-driven view of protein-protein interfaces is not limited to the field of proteases but can be applied to any target class where a sufficient amount of known substrate data is available.
序列标志常被用于说明蛋白酶的底物偏好和特异性。在这里,我们利用 MEROPS 数据库中的已编译底物,引入了一种新的度量标准,用于比较蛋白酶底物偏好。通过主成分分析和蛋白酶特异性树的构建,我们可以直观地可视化 62 种蛋白酶的相似性矩阵。由于我们的新度量标准仅基于底物数据,我们可以将包括不同进化起源的蛋白水解酶的蛋白酶树接枝。因此,我们的分析不仅证实了在序列基础上密切相关的蛋白酶之间,而且在不同进化起源和催化类型的蛋白水解酶之间,底物识别存在明显的重叠。为了说明我们的方法的适用性,我们分析了 ChEMBL 数据库中小分子的靶标在我们基于底物的蛋白酶特异性树中的分布。我们观察到注释靶标在树分支中的明显聚类,即使这些分组的靶标在蛋白质序列水平上不一定具有相似性。这突出了从肽底物中获得的知识在小分子药物设计中的价值和适用性,例如,用于预测脱靶效应或药物再利用。因此,我们的相似性度量标准允许通过比较已知的底物肽来映射降解组及其相关的药物靶标网络。基于底物的蛋白质-蛋白质界面的观点不仅限于蛋白酶领域,而且可以应用于任何具有足够数量已知底物数据的目标类别。