Suppr超能文献

StackDPPIV:一种用于准确预测二肽基肽酶 IV(DPP-IV)抑制肽的新型计算方法。

StackDPPIV: A novel computational approach for accurate prediction of dipeptidyl peptidase IV (DPP-IV) inhibitory peptides.

机构信息

Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.

Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.

出版信息

Methods. 2022 Aug;204:189-198. doi: 10.1016/j.ymeth.2021.12.001. Epub 2021 Dec 6.

Abstract

The development of efficient and effective bioinformatics tools and pipelines for identifying peptides with dipeptidyl peptidase IV (DPP-IV) inhibitory activities from large-scale protein datasets is of great importance for the discovery and development of potential and promising antidiabetic drugs. In this study, we present a novel stacking-based ensemble learning predictor (termed StackDPPIV) designed for identification of DPP-IV inhibitory peptides. Unlike the existing method, which is based on single-feature-based methods, we combined five popular machine learning algorithms in conjunction with ten different feature encodings from multiple perspectives to generate a pool of various baseline models. Subsequently, the probabilistic features derived from these baseline models were systematically integrated and deemed as new feature representations. Finally, in order to improve the predictive performance, the genetic algorithm based on the self-assessment-report was utilized to determine a set of informative probabilistic features and then used the optimal one for developing the final meta-predictor (StackDPPIV). Experiment results demonstrated that StackDPPIV could outperform its constituent baseline models on both the training and independent datasets. Furthermore, StackDPPIV achieved an accuracy of 0.891, MCC of 0.784 and AUC of 0.961, which were 9.4%, 19.0% and 11.4%, respectively, higher than that of the existing method on the independent test. Feature analysis demonstrated that our feature representations had more discriminative ability as compared to conventional feature descriptors, which highlights the combination of different features was essential for the performance improvement. In order to implement the proposed predictor, we had built a user-friendly online web server at http://pmlabstack.pythonanywhere.com/StackDPPIV.

摘要

从大规模蛋白质数据集中鉴定具有二肽基肽酶 IV(DPP-IV)抑制活性的肽的高效、有效的生物信息学工具和管道的开发对于发现和开发有潜力和有前途的抗糖尿病药物非常重要。在这项研究中,我们提出了一种新的基于堆叠的集成学习预测器(称为 StackDPPIV),用于鉴定 DPP-IV 抑制肽。与基于单特征的现有方法不同,我们结合了五种流行的机器学习算法,并结合了来自多个角度的十种不同的特征编码,生成了一组各种基线模型。随后,从这些基线模型中得出的概率特征被系统地整合,并被视为新的特征表示。最后,为了提高预测性能,基于自我评估报告的遗传算法被用于确定一组信息丰富的概率特征,然后使用最佳特征来开发最终的元预测器(StackDPPIV)。实验结果表明,StackDPPIV 在训练集和独立数据集上的表现均优于其组成的基线模型。此外,StackDPPIV 在独立测试中的准确率为 0.891,MCC 为 0.784,AUC 为 0.961,分别比现有方法高 9.4%、19.0%和 11.4%。特征分析表明,我们的特征表示比传统特征描述符具有更强的区分能力,这突出了不同特征的组合对于性能提升至关重要。为了实现所提出的预测器,我们在 http://pmlabstack.pythonanywhere.com/StackDPPIV 上构建了一个用户友好的在线网络服务器。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验