College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China.
Qixin School, Zhejiang Sci-Tech University, Hangzhou 310018, China.
Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.
Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.
许多蛋白质特征的组合被用于改进蛋白质结构类别的预测,但信息冗余问题往往被忽略。为了选择具有强分类能力的重要特征,我们提出了一种基于随机森林的递归特征选择方法,以改进蛋白质结构类别的预测。我们用四个实验来评估所提出的方法,并与现有的竞争预测方法进行了比较。结果表明,所提出的特征选择方法有效地提高了蛋白质结构类别的预测效率。仅使用不到 5%的特征,预测精度就提高了 4.6-13.3%。我们进一步比较了不同的蛋白质特征,发现预测的二级结构特征具有最佳的性能。这种理解可以用于设计更强大的蛋白质结构类别的预测方法。