Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.
Institute of Molecular Science and Technology, Ajou University, Suwon, Republic of Korea.
Bioinformatics. 2019 Aug 15;35(16):2757-2765. doi: 10.1093/bioinformatics/bty1047.
Cardiovascular disease is the primary cause of death globally accounting for approximately 17.7 million deaths per year. One of the stakes linked with cardiovascular diseases and other complications is hypertension. Naturally derived bioactive peptides with antihypertensive activities serve as promising alternatives to pharmaceutical drugs. So far, there is no comprehensive analysis, assessment of diverse features and implementation of various machine-learning (ML) algorithms applied for antihypertensive peptide (AHTP) model construction.
In this study, we utilized six different ML algorithms, namely, Adaboost, extremely randomized tree (ERT), gradient boosting (GB), k-nearest neighbor, random forest (RF) and support vector machine (SVM) using 51 feature descriptors derived from eight different feature encodings for the prediction of AHTPs. While ERT-based trained models performed consistently better than other algorithms regardless of various feature descriptors, we treated them as baseline predictors, whose predicted probability of AHTPs was further used as input features separately for four different ML-algorithms (ERT, GB, RF and SVM) and developed their corresponding meta-predictors using a two-step feature selection protocol. Subsequently, the integration of four meta-predictors through an ensemble learning approach improved the balanced prediction performance and model robustness on the independent dataset. Upon comparison with existing methods, mAHTPred showed superior performance with an overall improvement of approximately 6-7% in both benchmarking and independent datasets.
The user-friendly online prediction tool, mAHTPred is freely accessible at http://thegleelab.org/mAHTPred.
Supplementary data are available at Bioinformatics online.
心血管疾病是全球主要的死亡原因,每年约有 1770 万人因此死亡。与心血管疾病和其他并发症相关的一个问题是高血压。具有降血压活性的天然生物活性肽是药物的有前途的替代品。到目前为止,还没有针对降压肽(AHTP)模型构建的全面分析、评估各种特征和实施各种机器学习(ML)算法的综合分析。
在这项研究中,我们使用了六种不同的 ML 算法,即自适应增强(Adaboost)、极端随机树(ERT)、梯度提升(GB)、k-最近邻(kNN)、随机森林(RF)和支持向量机(SVM),使用 51 个特征描述符,来自八种不同的特征编码,用于预测 AHTP。虽然基于 ERT 的训练模型表现始终优于其他算法,而与各种特征描述符无关,但我们将其视为基线预测器,其预测的 AHTP 概率进一步作为输入特征分别用于四种不同的 ML 算法(ERT、GB、RF 和 SVM),并使用两步特征选择协议为它们开发相应的元预测器。随后,通过集成学习方法将四个元预测器集成,提高了独立数据集上的平衡预测性能和模型稳健性。与现有方法相比,mAHTPred 表现出优越的性能,在基准测试和独立数据集上的整体性能均提高了约 6-7%。
用户友好的在线预测工具 mAHTPred 可免费在 http://thegleelab.org/mAHTPred 上访问。
补充数据可在生物信息学在线获得。