Suppr超能文献

一种用于改进蛋白质结构类预测的特征与算法选择方法

A Feature and Algorithm Selection Method for Improving the Prediction of Protein Structural Class.

作者信息

Ni Qianwu, Chen Lei

机构信息

College of Information Engineering, Shanghai Maritime University, Shanghai 201306. China.

出版信息

Comb Chem High Throughput Screen. 2017;20(7):612-621. doi: 10.2174/1386207320666170314103147.

Abstract

AIM AND OBJECTIVE

Correct prediction of protein structural class is beneficial to investigation on protein functions, regulations and interactions. In recent years, several computational methods have been proposed in this regard. However, based on various features, it is still a great challenge to select proper classification algorithm and extract essential features to participate in classification.

MATERIAL AND METHODS

In this study, a feature and algorithm selection method was presented for improving the accuracy of protein structural class prediction. The amino acid compositions and physiochemical features were adopted to represent features and thirty-eight machine learning algorithms collected in Weka were employed. All features were first analyzed by a feature selection method, minimum redundancy maximum relevance (mRMR), producing a feature list. Then, several feature sets were constructed by adding features in the list one by one. For each feature set, thirtyeight algorithms were executed on a dataset, in which proteins were represented by features in the set. The predicted classes yielded by these algorithms and true class of each protein were collected to construct a dataset, which were analyzed by mRMR method, yielding an algorithm list. From the algorithm list, the algorithm was taken one by one to build an ensemble prediction model. Finally, we selected the ensemble prediction model with the best performance as the optimal ensemble prediction model.

RESULTS

Experimental results indicate that the constructed model is much superior to models using single algorithm and other models that only adopt feature selection procedure or algorithm selection procedure.

CONCLUSION

The feature selection procedure or algorithm selection procedure are really helpful for building an ensemble prediction model that can yield a better performance.

摘要

目的与目标

准确预测蛋白质结构类别有助于对蛋白质功能、调控及相互作用进行研究。近年来,针对这方面已提出了多种计算方法。然而,基于各种特征,选择合适的分类算法并提取关键特征以参与分类仍是一项巨大挑战。

材料与方法

在本研究中,提出了一种特征与算法选择方法以提高蛋白质结构类别预测的准确性。采用氨基酸组成和理化特征来表示特征,并使用了在Weka中收集的38种机器学习算法。所有特征首先通过一种特征选择方法——最小冗余最大相关度(mRMR)进行分析,生成一个特征列表。然后,通过逐一添加列表中的特征构建了几个特征集。对于每个特征集,在一个数据集中执行38种算法,其中蛋白质由该集合中的特征表示。收集这些算法产生的预测类别以及每个蛋白质的真实类别以构建一个数据集,通过mRMR方法对其进行分析,生成一个算法列表。从算法列表中,逐一选取算法构建一个集成预测模型。最后,我们选择性能最佳的集成预测模型作为最优集成预测模型。

结果

实验结果表明,构建的模型远优于使用单一算法的模型以及仅采用特征选择过程或算法选择过程的其他模型。

结论

特征选择过程或算法选择过程对于构建性能更好的集成预测模型确实有帮助。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验