Department of Pathophysiology, Southwest Medical University, Luzhou 646000, China.
Key Laboratory for NeuroInformation of Ministry of Education, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
Int J Biol Sci. 2018 May 22;14(8):957-964. doi: 10.7150/ijbs.24174. eCollection 2018.
Hormone-binding protein (HBP) is a kind of soluble carrier protein and can selectively and non-covalently interact with hormone. HBP plays an important role in life growth, but its function is still unclear. Correct recognition of HBPs is the first step to further study their function and understand their biological process. However, it is difficult to correctly recognize HBPs from more and more proteins through traditional biochemical experiments because of high experimental cost and long experimental period. To overcome these disadvantages, we designed a computational method for identifying HBPs accurately in the study. At first, we collected HBP data from UniProt to establish a high-quality benchmark dataset. Based on the dataset, the dipeptide composition was extracted from HBP residue sequences. In order to find out the optimal features to provide key clues for HBP identification, the analysis of various (ANOVA) was performed for feature ranking. The optimal features were selected through the incremental feature selection strategy. Subsequently, the features were inputted into support vector machine (SVM) for prediction model construction. Jackknife cross-validation results showed that 88.6% HBPs and 81.3% non-HBPs were correctly recognized, suggesting that our proposed model was powerful. This study provides a new strategy to identify HBPs. Moreover, based on the proposed model, we established a webserver called which could be freely accessed at http://lin-group.cn/server/HBPred.
激素结合蛋白(HBP)是一种可溶性载体蛋白,能够选择性地、非共价地与激素相互作用。HBP 在生命生长中发挥着重要作用,但它的功能仍不清楚。正确识别 HBPs 是进一步研究其功能和了解其生物学过程的第一步。然而,由于实验成本高、实验周期长,通过传统的生化实验很难从越来越多的蛋白质中正确识别 HBPs。为了克服这些缺点,我们在研究中设计了一种准确识别 HBPs 的计算方法。首先,我们从 UniProt 中收集 HBP 数据,以建立一个高质量的基准数据集。基于该数据集,从 HBP 残基序列中提取二肽组成。为了找出最佳特征,为 HBP 识别提供关键线索,我们对特征进行了各种(ANOVA)分析,以进行特征排序。通过增量特征选择策略选择最佳特征。随后,将特征输入支持向量机(SVM)进行预测模型构建。Jackknife 交叉验证结果表明,88.6%的 HBPs 和 81.3%的非 HBPs 被正确识别,表明我们提出的模型具有强大的功能。本研究为识别 HBPs 提供了一种新策略。此外,我们基于所提出的模型,建立了一个名为 HBPred 的免费在线服务器,可以在 http://lin-group.cn/server/HBPred 上访问。