Muthukrishnan Selvaraj, Puri Munish, Lefevre Christophe
Fermentation and Protein Biotechnology Laboratory, Department of Biotechnology, Punjabi University, Patiala, India, 2CSIR-IMTECH, Chandigarh, India.
BMC Res Notes. 2014 Jan 27;7:63. doi: 10.1186/1756-0500-7-63.
Plasminogen (Pg), the precursor of the proteolytic and fibrinolytic enzyme of blood, is converted to the active enzyme plasmin (Pm) by different plasminogen activators (tissue plasminogen activators and urokinase), including the bacterial activators streptokinase and staphylokinase, which activate Pg to Pm and thus are used clinically for thrombolysis. The identification of Pg-activators is therefore an important step in understanding their functional mechanism and derives new therapies.
In this study, different computational methods for predicting plasminogen activator peptide sequences with high accuracy were investigated, including support vector machines (SVM) based on amino acid (AC), dipeptide composition (DC), PSSM profile and Hybrid methods used to predict different Pg-activators from both prokaryotic and eukaryotic origins.
Overall maximum accuracy, evaluated using the five-fold cross validation technique, was 88.37%, 84.32%, 87.61%, 85.63% in 0.87, 0.83,0.86 and 0.85 MCC with amino (AC) or dipeptide composition (DC), PSSM profile and Hybrid methods respectively. Through this study, we have found that the different subfamilies of Pg-activators are quite closely correlated in terms of amino, dipeptide, PSSM and Hybrid compositions. Therefore, our prediction results show that plasminogen activators are predictable with a high accuracy from their primary sequence. Prediction performance was also cross-checked by confusion matrix and ROC (Receiver operating characteristics) analysis. A web server to facilitate the prediction of Pg-activators from primary sequence data was implemented.
The results show that dipeptide, PSSM profile, and Hybrid based methods perform better than single amino acid composition (AC). Furthermore, we also have developed a web server, which predicts the Pg-activators and their classification (available online at http://mamsap.it.deakin.edu.au/plas_pred/home.html). Our experimental results show that our approaches are faster and achieve generally a good prediction performance.
纤溶酶原(Pg)是血液中蛋白水解和纤维蛋白溶解酶的前体,可被不同的纤溶酶原激活剂(组织纤溶酶原激活剂和尿激酶)转化为活性酶纤溶酶(Pm),包括细菌激活剂链激酶和葡萄球菌激酶,它们将Pg激活为Pm,因此在临床上用于溶栓。因此,鉴定Pg激活剂是理解其功能机制并开发新疗法的重要一步。
在本研究中,研究了不同的高精度预测纤溶酶原激活剂肽序列的计算方法,包括基于氨基酸(AC)、二肽组成(DC)、位置特异性打分矩阵(PSSM)谱的支持向量机(SVM)以及用于预测来自原核和真核来源的不同Pg激活剂的混合方法。
使用五折交叉验证技术评估的总体最大准确率,在使用氨基酸(AC)或二肽组成(DC)、PSSM谱和混合方法时,分别为88.37%、84.32%、87.61%、85.63%,马修斯相关系数(MCC)分别为0.87、0.83、0.86和0.85。通过这项研究,我们发现Pg激活剂的不同亚家族在氨基酸、二肽、PSSM和混合组成方面密切相关。因此,我们的预测结果表明,从其一级序列可以高精度预测纤溶酶原激活剂。预测性能还通过混淆矩阵和ROC(接收者操作特征)分析进行了交叉检验。实现了一个网络服务器,以方便从一级序列数据预测Pg激活剂。
结果表明,基于二肽、PSSM谱和混合的方法比单一氨基酸组成(AC)表现更好。此外,我们还开发了一个网络服务器,用于预测Pg激活剂及其分类(可在http://mamsap.it.deakin.edu.au/plas_pred/home.html在线获取)。我们的实验结果表明,我们的方法更快,并且通常具有良好的预测性能。