Zhang Hong-Qi, Liu Shang-Hua, Li Rui, Yu Jun-Wen, Ye Dong-Xin, Yuan Shi-Shi, Lin Hao, Huang Cheng-Bing, Tang Hua
School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China.
School of Computer Science and Technology, Aba Teachers University, Aba 623002, China.
ACS Omega. 2024 Feb 8;9(7):8439-8447. doi: 10.1021/acsomega.3c09587. eCollection 2024 Feb 20.
In biological organisms, metal ion-binding proteins participate in numerous metabolic activities and are closely associated with various diseases. To accurately predict whether a protein binds to metal ions and the type of metal ion-binding protein, this study proposed a classifier named MIBPred. The classifier incorporated advanced Word2Vec technology from the field of natural language processing to extract semantic features of the protein sequence language and combined them with position-specific score matrix (PSSM) features. Furthermore, an ensemble learning model was employed for the metal ion-binding protein classification task. In the model, we independently trained XGBoost, LightGBM, and CatBoost algorithms and integrated the output results through an SVM voting mechanism. This innovative combination has led to a significant breakthrough in the predictive performance of our model. As a result, we achieved accuracies of 95.13% and 85.19%, respectively, in predicting metal ion-binding proteins and their types. Our research not only confirms the effectiveness of Word2Vec technology in extracting semantic information from protein sequences but also highlights the outstanding performance of the MIBPred classifier in the problem of metal ion-binding protein types. This study provides a reliable tool and method for the in-depth exploration of the structure and function of metal ion-binding proteins.
在生物有机体中,金属离子结合蛋白参与众多代谢活动,并与各种疾病密切相关。为了准确预测一种蛋白质是否结合金属离子以及金属离子结合蛋白的类型,本研究提出了一种名为MIBPred的分类器。该分类器整合了自然语言处理领域先进的Word2Vec技术,以提取蛋白质序列语言的语义特征,并将其与位置特异性得分矩阵(PSSM)特征相结合。此外,采用了一种集成学习模型来进行金属离子结合蛋白的分类任务。在该模型中,我们独立训练了XGBoost、LightGBM和CatBoost算法,并通过支持向量机投票机制整合输出结果。这种创新组合使我们模型的预测性能有了显著突破。结果,我们在预测金属离子结合蛋白及其类型方面分别达到了95.13%和85.19%的准确率。我们的研究不仅证实了Word2Vec技术在从蛋白质序列中提取语义信息方面的有效性,还突出了MIBPred分类器在金属离子结合蛋白类型问题上的出色表现。本研究为深入探索金属离子结合蛋白的结构和功能提供了可靠的工具和方法。