College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar.
Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan.
Methods. 2024 Oct;230:129-139. doi: 10.1016/j.ymeth.2024.08.001. Epub 2024 Aug 22.
Host defense or antimicrobial peptides (AMPs) are promising candidates for protecting host against microbial pathogens for example bacteria, virus, fungi, yeast. Defensins are the type of AMPs that act as potential therapeutic drug agent and perform vital role in various biological process. Conventional Experiments to identify defensin peptides (DPs) are time consuming and expensive. Thus, the shortcomings of wet lab experiments are leveraged by computational methods to accurately predict the functional types of DPs. In this paper, we aim to propose a novel multi-class ensemble-based prediction model called StackDPPred for identifying the properties of DPs. The peptide sequences are encoded using split amino acid composition (SAAC), segmented position specific scoring matrix (SegPSSM), histogram of oriented gradients-based PSSM (HOGPSSM) and feature extraction based graphical and statistical (FEGS) descriptors. Next, principal component analysis (PCA) is used to select the best subset of attributes. After that, the optimized features are fed into single machine learning and stacking-based ensemble classifiers. Furthermore, the ablation study demonstrates the robustness and efficacy of the stacking approach using reduced features for predicting DPs and their families. The proposed StackDPPred method improves the overall accuracy by 13.41% and 7.62% compared to existing DPs predictors iDPF-PseRAAC and iDEF-PseRAAC, respectively on validation test. Additionally, we applied the local interpretable model-agnostic explanations (LIME) algorithm to understand the contribution of selected features to the overall prediction. We believe, StackDPPred could serve as a valuable tool accelerating the screening of large-scale DPs and peptide-based drug discovery process.
宿主防御或抗菌肽 (AMPs) 是保护宿主免受细菌、病毒、真菌、酵母等微生物病原体侵害的有前途的候选物。防御素是 AMP 中的一种类型,可作为潜在的治疗药物,并在各种生物过程中发挥重要作用。识别防御素肽 (DPs) 的传统实验既耗时又昂贵。因此,计算方法利用湿实验室实验的缺点,准确预测 DPs 的功能类型。在本文中,我们旨在提出一种新的基于多类集成的预测模型,称为 StackDPPred,用于识别 DPs 的特性。肽序列使用分裂氨基酸组成 (SAAC)、分段位置特定评分矩阵 (SegPSSM)、基于方向梯度的 PSSM 的直方图 (HOGPSSM) 和基于特征提取的图形和统计 (FEGS) 描述符进行编码。接下来,使用主成分分析 (PCA) 选择最佳属性子集。之后,将优化后的特征输入到单机学习和基于堆叠的集成分类器中。此外,通过使用减少的特征进行预测 DP 及其家族的消融研究证明了堆叠方法的稳健性和有效性。与现有的 DPs 预测器 iDPF-PseRAAC 和 iDEF-PseRAAC 相比,所提出的 StackDPPred 方法在验证测试中分别将整体准确性提高了 13.41%和 7.62%。此外,我们应用了局部可解释模型不可知解释 (LIME) 算法来了解所选特征对整体预测的贡献。我们相信,StackDPPred 可以作为一种有价值的工具,加速大规模 DPs 和基于肽的药物发现过程的筛选。