Guo Yuxin, Hou Liping, Zhu Wen, Wang Peng
Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.
Yangtze Delta Region Institute, University of Electronic Science and Technology of China, Quzhou, China.
Front Genet. 2021 Nov 23;12:797641. doi: 10.3389/fgene.2021.797641. eCollection 2021.
Hormone binding protein (HBP) is a soluble carrier protein that interacts selectively with different types of hormones and has various effects on the body's life activities. HBPs play an important role in the growth process of organisms, but their specific role is still unclear. Therefore, correctly identifying HBPs is the first step towards understanding and studying their biological function. However, due to their high cost and long experimental period, it is difficult for traditional biochemical experiments to correctly identify HBPs from an increasing number of proteins, so the real characterization of HBPs has become a challenging task for researchers. To measure the effectiveness of HBPs, an accurate and reliable prediction model for their identification is desirable. In this paper, we construct the prediction model HBP_NB. First, HBPs data were collected from the UniProt database, and a dataset was established. Then, based on the established high-quality dataset, the k-mer (K = 3) feature representation method was used to extract features. Second, the feature selection algorithm was used to reduce the dimensionality of the extracted features and select the appropriate optimal feature set. Finally, the selected features are input into Naive Bayes to construct the prediction model, and the model is evaluated by using 10-fold cross-validation. The final results were 95.45% accuracy, 94.17% sensitivity and 96.73% specificity. These results indicate that our model is feasible and effective.
激素结合蛋白(HBP)是一种可溶性载体蛋白,它能与不同类型的激素选择性相互作用,并对机体的生命活动产生多种影响。HBP在生物体的生长过程中发挥着重要作用,但其具体作用仍不明确。因此,正确识别HBP是理解和研究其生物学功能的第一步。然而,由于传统生化实验成本高、实验周期长,难以从越来越多的蛋白质中正确识别HBP,所以HBP的真正表征已成为研究人员面临的一项具有挑战性的任务。为了衡量HBP的有效性,需要一个准确可靠的识别预测模型。在本文中,我们构建了预测模型HBP_NB。首先,从UniProt数据库收集HBP数据,并建立一个数据集。然后,基于已建立的高质量数据集,使用k-mer(K = 3)特征表示方法提取特征。其次,使用特征选择算法对提取的特征进行降维,并选择合适的最优特征集。最后,将所选特征输入朴素贝叶斯构建预测模型,并使用10折交叉验证对模型进行评估。最终结果的准确率为95.45%,灵敏度为94.17%,特异性为96.73%。这些结果表明我们的模型是可行且有效的。