School of Computer Science and Technology, Tianjin University, Tianjin, China.
School of Computer Software, College of Intelligence and Computing, Tianjin University, Tianjin, China.
Bioinformatics. 2019 Nov 1;35(21):4272-4280. doi: 10.1093/bioinformatics/btz246.
Prediction of therapeutic peptides is critical for the discovery of novel and efficient peptide-based therapeutics. Computational methods, especially machine learning based methods, have been developed for addressing this need. However, most of existing methods are peptide-specific; currently, there is no generic predictor for multiple peptide types. Moreover, it is still challenging to extract informative feature representations from the perspective of primary sequences.
In this study, we have developed PEPred-Suite, a bioinformatics tool for the generic prediction of therapeutic peptides. In PEPred-Suite, we introduce an adaptive feature representation strategy that can learn the most representative features for different peptide types. To be specific, we train diverse sequence-based feature descriptors, integrate the learnt class information into our features, and utilize a two-step feature optimization strategy based on the area under receiver operating characteristic curve to extract the most discriminative features. Using the learnt representative features, we trained eight random forest models for eight different types of functional peptides, respectively. Benchmarking results showed that as compared with existing predictors, PEPred-Suite achieves better and robust performance for different peptides. As far as we know, PEPred-Suite is currently the first tool that is capable of predicting so many peptide types simultaneously. In addition, our work demonstrates that the learnt features can reliably predict different peptides.
The user-friendly webserver implementing the proposed PEPred-Suite is freely accessible at http://server.malab.cn/PEPred-Suite.
Supplementary data are available at Bioinformatics online.
预测治疗性肽对于发现新型有效基于肽的治疗方法至关重要。已经开发了计算方法,特别是基于机器学习的方法,以满足这一需求。然而,大多数现有的方法都是针对特定肽的;目前,没有用于多种肽类型的通用预测器。此外,从一级序列的角度提取信息特征表示仍然具有挑战性。
在这项研究中,我们开发了 PEPred-Suite,这是一种用于治疗性肽通用预测的生物信息学工具。在 PEPred-Suite 中,我们引入了一种自适应特征表示策略,可以学习不同肽类型的最具代表性特征。具体来说,我们训练了多种基于序列的特征描述符,将学习到的类别信息集成到我们的特征中,并利用基于接收者操作特征曲线下面积的两步特征优化策略来提取最具鉴别力的特征。使用学习到的代表性特征,我们分别为八种功能肽训练了八个随机森林模型。基准测试结果表明,与现有预测器相比,PEPred-Suite 对不同的肽具有更好和更稳健的性能。据我们所知,PEPred-Suite 是目前第一个能够同时预测如此多肽类型的工具。此外,我们的工作表明,学习到的特征可以可靠地预测不同的肽。
实现所提出的 PEPred-Suite 的用户友好型网络服务器可在 http://server.malab.cn/PEPred-Suite 上免费访问。
补充数据可在 Bioinformatics 在线获得。