Bibi Rimsha, Qasmi Noshaba, Rashid Sajid
National Center for Bioinformatics, Quaid-i-Azam University, Pakistan.
PLoS One. 2025 Jul 11;20(7):e0327578. doi: 10.1371/journal.pone.0327578. eCollection 2025.
Crude cone snail venom is a rich source of bioactive compounds with significant therapeutic potential. In this study, we conducted a comprehensive analysis of 5,985 cone snail peptides across 82 Conus species to identify unique cysteine (Cys) patterns and associated frameworks. The classification of these Cys patterns, based on conserved framework combinations, enabled the generation of species-level pattern barcodes. These barcodes were then evaluated to assess the species correlations of individual sequences. By analyzing 151 known Conus peptide PDB files, we computed Cys disulfide linkages to assess overall stability profiles. Incorporating barcode data allowed us to filter the dataset and prepare it for machine learning (ML) processing. Random Forest (RF) modeling, a supervised learning technique, was used to predict the therapeutic potential of venom peptides. Feature extraction was based on known venom-derived approved peptide-based drugs. The dataset was split into a 70:30 train-test ratio. A total of 6,430 peptides (5,985 from cone snails and 445 from other venomous species) were used to evaluate model prediction capability. The proposed model achieved ideal accuracy (90.48%) in peptide therapeutic classification. Subsequent model outputs underwent further structural and binding pattern analysis against known targets, revealing significant similarities between the binding patterns of approved and novel peptides. The model's performance could be further enhanced by incorporating additional datasets and optimizing feature selection, potentially broadening its applicability to larger peptide datasets. Overall, this study underscores the potential of ML in advancing pharmacological research on diverse venom peptides.
粗制芋螺毒液是生物活性化合物的丰富来源,具有巨大的治疗潜力。在本研究中,我们对82种芋螺的5985种芋螺肽进行了全面分析,以确定独特的半胱氨酸(Cys)模式及相关框架。基于保守框架组合对这些Cys模式进行分类,能够生成物种水平的模式条形码。然后对这些条形码进行评估,以评估单个序列的物种相关性。通过分析151个已知的芋螺肽PDB文件,我们计算了Cys二硫键以评估整体稳定性概况。纳入条形码数据使我们能够筛选数据集并为机器学习(ML)处理做好准备。随机森林(RF)建模是一种监督学习技术,用于预测毒液肽的治疗潜力。特征提取基于已知的源自毒液的已批准的基于肽的药物。数据集按70:30的训练-测试比例划分。总共6430种肽(5985种来自芋螺,445种来自其他有毒物种)用于评估模型预测能力。所提出的模型在肽治疗分类中达到了理想的准确率(90.48%)。随后,针对已知靶点对模型输出进行了进一步的结构和结合模式分析,揭示了已批准肽和新型肽结合模式之间的显著相似性。通过纳入更多数据集和优化特征选择,模型的性能可以进一步提高,这可能会扩大其对更大肽数据集的适用性。总体而言,本研究强调了机器学习在推进对多种毒液肽的药理学研究方面的潜力。