基于机器学习和基于相似度的方法的组合预测肽激素。

Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods.

机构信息

Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.

出版信息

Proteomics. 2024 Oct;24(20):e2400004. doi: 10.1002/pmic.202400004. Epub 2024 May 27.

Abstract

Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.

摘要

肽类激素作为基因组编码的信号转导分子,在多细胞生物中发挥着重要作用,它们的失调会导致各种健康问题。在这项研究中,我们提出了一种高精度预测激素肽的方法。用于训练、测试和评估我们模型的数据集由 1174 个激素肽和 1174 个非激素肽序列组成。最初,我们使用 BLAST 和 MERCI 软件开发了基于相似性的方法。虽然这些基于相似性的方法提供了正确预测的高概率,但它们也有局限性,例如没有命中或预测有限的序列。为了克服这些限制,我们进一步开发了基于机器和深度学习的模型。我们基于逻辑回归的模型在独立/验证数据集中达到了 0.93 的最大 AUROC 和 86%的准确率。为了利用基于相似性和机器学习的模型的优势,我们开发了一种集成方法,在验证集上达到了 0.96 的 AUROC、89.79%的准确率和 0.8 的马修斯相关系数(MCC)。为了方便研究人员预测和设计激素肽,我们开发了一个名为 HOPPred 的基于网络的服务器。该服务器提供了一个独特的功能,可以识别激素肽中的激素相关基序。该服务器可以在以下网址访问:https://webs.iiitd.edu.in/raghava/hoppred/。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索