Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.
Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand.
J Proteome Res. 2020 Oct 2;19(10):4125-4136. doi: 10.1021/acs.jproteome.0c00590. Epub 2020 Sep 19.
The inhibition of dipeptidyl peptidase IV (DPP-IV, E.C.3.4.14.5) is well recognized as a new avenue for the treatment of Type 2 diabetes (T2D). Until now, peptide-like DDP-IV inhibitors have been shown to normalize the blood glucose concentration in T2D subjects. To the best of our knowledge, there is yet no computational model for predicting and analyzing DPP-IV inhibitory peptides using sequence information. In this study, we present for the first time a simple and easily interpretable sequence-based predictor using the scoring card method (SCM) for modeling the bioactivity of DPP-IV inhibitory peptides (iDPPIV-SCM). Particularly, the iDPPIV-SCM was developed by employing the SCM method together with the propensity scores of amino acids. Rigorous independent test results demonstrated that the proposed iDPPIV-SCM was found to be superior to those of well-known machine learning (ML) classifiers (e.g., k-nearest neighbor, logistic regression, and decision tree) with demonstrated improvements of 2-11, 4-22, and 7-10% for accuracy, MCC, and AUC, respectively, while also achieving comparable results to that of the support vector machine. Furthermore, the analysis of estimated propensity scores of amino acids as derived from the iDPPIV-SCM was performed so as to provide a more in-depth understanding on the molecular basis for enhancing the DPP-IV inhibitory potency. Taken together, these results revealed that iDPPIV-SCM was superior to those of other well-known ML classifiers owing to its simplicity, interpretability, and validity. For the convenience of biologists, the predictive model is deployed as a publicly accessible web server at http://camt.pythonanywhere.com/iDPPIV-SCM. It is anticipated that iDPPIV-SCM can serve as an important tool for the rapid screening of promising DPP-IV inhibitory peptides prior to their synthesis.
二肽基肽酶 4(DPP-IV,E.C.3.4.14.5)的抑制作用已被广泛认为是治疗 2 型糖尿病(T2D)的新途径。到目前为止,肽样 DPP-IV 抑制剂已被证明可使 T2D 患者的血糖浓度正常化。据我们所知,目前还没有使用序列信息预测和分析 DPP-IV 抑制肽的计算模型。在这项研究中,我们首次使用基于评分卡方法(SCM)的简单且易于解释的序列基预测器来构建 DPP-IV 抑制肽的生物活性模型(iDPPIV-SCM)。特别地,iDPPIV-SCM 是通过将 SCM 方法与氨基酸倾向得分相结合而开发的。严格的独立测试结果表明,与著名的机器学习(ML)分类器(例如 k-最近邻、逻辑回归和决策树)相比,所提出的 iDPPIV-SCM 表现更为出色,其准确性、MCC 和 AUC 分别提高了 2-11%、4-22%和 7-10%,同时也达到了支持向量机的可比结果。此外,还对源自 iDPPIV-SCM 的氨基酸估计倾向得分进行了分析,以便更深入地了解增强 DPP-IV 抑制效力的分子基础。综上所述,由于其简单性、可解释性和有效性,iDPPIV-SCM 优于其他著名的 ML 分类器。为了方便生物学家使用,该预测模型已作为一个可公开访问的网络服务器部署在 http://camt.pythonanywhere.com/iDPPIV-SCM 上。预计 iDPPIV-SCM 可以作为在合成之前快速筛选有前途的 DPP-IV 抑制肽的重要工具。