基于K-mer特征表示和朴素贝叶斯的激素结合蛋白预测

Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes.

作者信息

Guo Yuxin, Hou Liping, Zhu Wen, Wang Peng

机构信息

Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.

Yangtze Delta Region Institute, University of Electronic Science and Technology of China, Quzhou, China.

出版信息

Front Genet. 2021 Nov 23;12:797641. doi: 10.3389/fgene.2021.797641. eCollection 2021.

DOI:10.3389/fgene.2021.797641

PMID:34887905

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8650314/

Abstract

Hormone binding protein (HBP) is a soluble carrier protein that interacts selectively with different types of hormones and has various effects on the body's life activities. HBPs play an important role in the growth process of organisms, but their specific role is still unclear. Therefore, correctly identifying HBPs is the first step towards understanding and studying their biological function. However, due to their high cost and long experimental period, it is difficult for traditional biochemical experiments to correctly identify HBPs from an increasing number of proteins, so the real characterization of HBPs has become a challenging task for researchers. To measure the effectiveness of HBPs, an accurate and reliable prediction model for their identification is desirable. In this paper, we construct the prediction model HBP_NB. First, HBPs data were collected from the UniProt database, and a dataset was established. Then, based on the established high-quality dataset, the k-mer (K = 3) feature representation method was used to extract features. Second, the feature selection algorithm was used to reduce the dimensionality of the extracted features and select the appropriate optimal feature set. Finally, the selected features are input into Naive Bayes to construct the prediction model, and the model is evaluated by using 10-fold cross-validation. The final results were 95.45% accuracy, 94.17% sensitivity and 96.73% specificity. These results indicate that our model is feasible and effective.

摘要

激素结合蛋白（HBP）是一种可溶性载体蛋白，它能与不同类型的激素选择性相互作用，并对机体的生命活动产生多种影响。HBP在生物体的生长过程中发挥着重要作用，但其具体作用仍不明确。因此，正确识别HBP是理解和研究其生物学功能的第一步。然而，由于传统生化实验成本高、实验周期长，难以从越来越多的蛋白质中正确识别HBP，所以HBP的真正表征已成为研究人员面临的一项具有挑战性的任务。为了衡量HBP的有效性，需要一个准确可靠的识别预测模型。在本文中，我们构建了预测模型HBP_NB。首先，从UniProt数据库收集HBP数据，并建立一个数据集。然后，基于已建立的高质量数据集，使用k-mer（K = 3）特征表示方法提取特征。其次，使用特征选择算法对提取的特征进行降维，并选择合适的最优特征集。最后，将所选特征输入朴素贝叶斯构建预测模型，并使用10折交叉验证对模型进行评估。最终结果的准确率为95.45%，灵敏度为94.17%，特异性为96.73%。这些结果表明我们的模型是可行且有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4a5f/8650314/437fa8bb0a80/fgene-12-797641-g001.jpg

相似文献

Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes.

Front Genet. 2021 Nov 23;12:797641. doi: 10.3389/fgene.2021.797641. eCollection 2021.

HBPred: a tool to identify growth hormone-binding proteins.

Int J Biol Sci. 2018 May 22;14(8):957-964. doi: 10.7150/ijbs.24174. eCollection 2018.

Ensemble Learning for Hormone Binding Protein Prediction: A Promising Approach for Early Diagnosis of Thyroid Hormone Disorders in Serum.

Diagnostics (Basel). 2023 Jun 1;13(11):1940. doi: 10.3390/diagnostics13111940.

iHBPs-VWDC: variable-length window-based dynamic connectivity approach for identifying hormone-binding proteins.

J Biomol Struct Dyn. 2025 Jan;43(1):550-559. doi: 10.1080/07391102.2023.2283150. Epub 2023 Nov 18.

SCMHBP: prediction and analysis of heme binding proteins using propensity scores of dipeptides.

BMC Bioinformatics. 2014;15 Suppl 16(Suppl 16):S4. doi: 10.1186/1471-2105-15-S16-S4. Epub 2014 Dec 8.

Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods.

Comput Struct Biotechnol J. 2023 Mar 17;21:2253-2261. doi: 10.1016/j.csbj.2023.03.024. eCollection 2023.

A representation transfer learning approach for enhanced prediction of growth hormone binding proteins.

Comput Biol Chem. 2020 May 5;87:107274. doi: 10.1016/j.compbiolchem.2020.107274.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

Application of radiography of computed tomography in non-small cell lung cancer using prognosis model.

Saudi J Biol Sci. 2020 Apr;27(4):1066-1072. doi: 10.1016/j.sjbs.2020.02.016. Epub 2020 Mar 4.

Upper-Limb Motion Recognition Based on Hybrid Feature Selection: Algorithm Development and Validation.

JMIR Mhealth Uhealth. 2021 Sep 2;9(9):e24402. doi: 10.2196/24402.

引用本文的文献

A Survey of Biological Function Prediction Methods with Focus on Natural Language Processing (NLP) and Large Language Models (LLM).

Methods Mol Biol. 2025;2941:201-225. doi: 10.1007/978-1-0716-4623-6_13.

Empirical comparison and recent advances of computational prediction of hormone binding proteins using machine learning methods.

Comput Struct Biotechnol J. 2023 Mar 17;21:2253-2261. doi: 10.1016/j.csbj.2023.03.024. eCollection 2023.

Machine learning-aided scoring of synthesis difficulties for designer chromosomes.

Sci China Life Sci. 2023 Jul;66(7):1615-1625. doi: 10.1007/s11427-023-2306-x. Epub 2023 Mar 3.

本文引用的文献

iTTCA-RF: a random forest predictor for tumor T cell antigens.

J Transl Med. 2021 Oct 27;19(1):449. doi: 10.1186/s12967-021-03084-x.

Integration of Multiple-Omics Data to Analyze the Population-Specific Differences for Coronary Artery Disease.

Comput Math Methods Med. 2021 Aug 17;2021:7036592. doi: 10.1155/2021/7036592. eCollection 2021.

Prediction of diabetic protein markers based on an ensemble method.

Front Biosci (Landmark Ed). 2021 Jul 30;26(7):207-221. doi: 10.52586/4935.

SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets.

IEEE/ACM Trans Comput Biol Bioinform. 2022 Jul-Aug;19(4):2442-2453. doi: 10.1109/TCBB.2021.3079116. Epub 2022 Aug 8.

rBPDL:Predicting RNA-Binding Proteins Using Deep Learning.

IEEE J Biomed Health Inform. 2021 Sep;25(9):3668-3676. doi: 10.1109/JBHI.2021.3069259. Epub 2021 Sep 3.

rs1990622 variant associates with Alzheimer's disease and regulates TMEM106B expression in human brain tissues.

BMC Med. 2021 Jan 19;19(1):11. doi: 10.1186/s12916-020-01883-5.

rs34331204 regulates TSPAN13 expression and contributes to Alzheimer's disease with sex differences.

Brain. 2020 Dec 5;143(11):e95. doi: 10.1093/brain/awaa302.

An in silico approach to identification, categorization and prediction of nucleic acid binding proteins.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa171.

Basic polar and hydrophobic properties are the main characteristics that affect the binding of transcription factors to methylation sites.

Bioinformatics. 2020 Aug 1;36(15):4263-4268. doi: 10.1093/bioinformatics/btaa492.

Non-coding RNA Associated Competitive Endogenous RNA Regulatory Network: Novel Therapeutic Approach in Liver Fibrosis.

Curr Gene Ther. 2019;19(5):305-317. doi: 10.2174/1566523219666191107113046.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于K-mer特征表示和朴素贝叶斯的激素结合蛋白预测

Prediction of Hormone-Binding Proteins Based on K-mer Feature Representation and Naive Bayes.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献