XGBFEMF：基于 XGBoost 的必需蛋白预测框架。

XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction.

出版信息

IEEE Trans Nanobioscience. 2018 Jul;17(3):243-250. doi: 10.1109/TNB.2018.2842219. Epub 2018 May 31.

DOI:10.1109/TNB.2018.2842219

Abstract

Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.

摘要

必需蛋白作为维持细胞生命的重要组成部分，在生物学和药物设计研究中发挥着重要作用。随着与必需蛋白相关的大量生物数据的产生，越来越多的计算方法被提出来。与采用单一机器学习方法或集成机器学习方法的方法不同，本文提出了一种名为 XGBFEMF 的预测框架，用于识别必需蛋白，该框架包括 SUB-EXPAND-SHRINK 方法，用于构建原始特征和获得更好的必需蛋白预测特征子集的复合特征，还包括模型融合方法，以获得更有效的预测模型。我们在酵母数据上进行实验，通过 ROC 分析、准确性分析和顶部分析来评估 XGBFEMF 的性能。同时，我们在大肠杆菌数据上进行实验，以验证性能。实验结果表明，XGBFEMF 框架可以有效地提高许多必需指标。此外，我们分析了 XGBFEMF 框架中的每一步；我们的结果表明，SUB-EXPAND-SHRINK 方法的每一步以及多模型融合的步骤都可以提高预测性能。