Central Research Institute of Epidemiology, Moscow 111123, Russia; A.A.Kharkevich Institute of Information Transmission Problems, Moscow 127051, Russia; Moscow Institute of Physics and Technology, Dolgoprudny 141700, Russia.
Sanford-Burnham-Prebys Medical Discovery Institute, La Jolla, CA 92037, USA.
Biochim Biophys Acta Proteins Proteom. 2019 Nov;1867(11):140253. doi: 10.1016/j.bbapap.2019.07.006. Epub 2019 Jul 19.
Bioinformatics-based prediction of protease substrates can help to elucidate regulatory proteolytic pathways that control a broad range of biological processes such as apoptosis and blood coagulation. The majority of published predictive models are position weight matrices (PWM) reflecting specificity of proteases toward target sequence. These models are typically derived from experimental data on positions of hydrolyzed peptide bonds and show a reasonable predictive power. New emerging techniques that not only register the cleavage position but also measure catalytic efficiency of proteolysis are expected to improve the quality of predictions or at least substantially reduce the number of tested substrates required for confident predictions. The main goal of this study was to develop new prediction models based on such data and to estimate the performance of the constructed models. We used data on catalytic efficiency of proteolysis measured for eight major human matrix metalloproteinases to construct predictive models of protease specificity using a variety of regression analysis techniques. The obtained results suggest that efficiency-based (quantitative) models show a comparable performance with conventional PWM-based algorithms, while less training data are required. The derived list of candidate cleavage sites in human secreted proteins may serve as a starting point for experimental analysis.
基于生物信息学的蛋白酶底物预测有助于阐明调控蛋白水解途径,这些途径控制着广泛的生物学过程,如细胞凋亡和血液凝固。大多数已发表的预测模型是反映蛋白酶对靶序列特异性的位置权重矩阵(PWM)。这些模型通常是根据水解肽键位置的实验数据得出的,具有合理的预测能力。新出现的技术不仅可以记录切割位置,还可以测量蛋白水解的催化效率,有望提高预测的质量,或者至少可以大大减少进行有信心预测所需的测试底物数量。本研究的主要目的是基于这些数据开发新的预测模型,并估计所构建模型的性能。我们使用了针对八种主要人基质金属蛋白酶的蛋白水解催化效率数据,使用多种回归分析技术构建了蛋白酶特异性的预测模型。所得结果表明,基于效率的(定量)模型与传统的基于 PWM 的算法具有相当的性能,而所需的训练数据较少。在人类分泌蛋白中推导的候选切割位点列表可以作为实验分析的起点。