Suppr超能文献

基于 ANOVA-粒子群优化的特征选择和梯度提升机分类器,用于提高蛋白质-蛋白质相互作用预测。

ANOVA-particle swarm optimization-based feature selection and gradient boosting machine classifier for improved protein-protein interaction prediction.

机构信息

Department of Electronics and Communication Engineering, Birla Institute of Technology, Ranchi, India.

出版信息

Proteins. 2022 Feb;90(2):443-454. doi: 10.1002/prot.26236. Epub 2021 Sep 29.

Abstract

Feature fusion and selection strategies have been applied to improve accuracy in the prediction of protein-protein interaction (PPI). In this paper, an embedded feature selection framework is developed by integrating a cost function based on analysis of variance (ANOVA) with the particle swarm optimization (PSO), termed AVPSO. Initially, the features of the protein sequences extracted using pseudo-amino acid composition (PseAAC), conjoint triad composition, and local descriptor are fused. Then, AVPSO is employed to select the optimal set of features. The light gradient boosting machine (LGBM) classifier is used to predict the PPIs using the optimal feature subset. On the five-fold cross-validation analysis, the proposed model (AVPSO-LGBM) achieved an average accuracy of 97.12% and 95.09%, respectively, on the intraspecies PPI datasets Saccharomyces cerevisiae and Helicobacter pylori. On the interspecies, PPI datasets of the Human-Bacillus and Human-Yersinia, an average accuracy of 95.20% and 93.44%, are achieved. Results obtained on independent test datasets, and network datasets show that the prediction accuracy of the AVPSO-LGBM is better than the existing methods, demonstrating its generalization ability. The improved prediction performance obtained by the proposed model makes it a reliable and effective PPI prediction model.

摘要

特征融合和选择策略已被应用于提高蛋白质-蛋白质相互作用 (PPI) 预测的准确性。在本文中,通过将基于方差分析 (ANOVA) 的成本函数与粒子群优化 (PSO) 集成,开发了一种嵌入式特征选择框架,称为 AVPSO。最初,使用伪氨基酸组成 (PseAAC)、联合三联体组成和局部描述符提取蛋白质序列的特征。然后,使用 AVPSO 选择最佳特征集。使用最佳特征子集,使用轻梯度提升机 (LGBM) 分类器预测 PPIs。在五折交叉验证分析中,在所提出的模型 (AVPSO-LGBM) 上,在 Saccharomyces cerevisiae 和 Helicobacter pylori 两种物种的 PPI 数据集上分别获得了 97.12%和 95.09%的平均准确率。在种间 PPI 数据集 Human-Bacillus 和 Human-Yersinia 上,分别获得了 95.20%和 93.44%的平均准确率。在独立测试数据集和网络数据集上的结果表明,AVPSO-LGBM 的预测精度优于现有方法,证明了其泛化能力。所提出模型获得的改进的预测性能使其成为一种可靠且有效的 PPI 预测模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验