基于氨基酸组成分布的嗜热蛋白和嗜冷蛋白分类随机森林法

[Random forest for classification of thermophilic and psychrophilic proteins based on amino acid composition distribution].

作者信息

Zhang Guangya, Fang Baishan

机构信息

Key Laboratory of Industrial Biotechnology, Huaqiao University, Quanzhou 362021, China.

出版信息

Sheng Wu Gong Cheng Xue Bao. 2008 Feb;24(2):302-8.

DOI:
Abstract

We used amino acid composition distribution (AACD) to discriminate thermophilic and psychrophilic proteins. We used 10-fold cross-validation and independent testing with other dataset to evaluate the models. The results showed that when the segment was 1, the overall accuracy reached 92.9% and 90.2%, respectively. The AACD method improved the prediction accuracy when support vector machine was used as the classifier. When six new features were introduced, the overall accuracy of random forest improved to 93.2% and 92.2%, the areas under the receiver operation characteristic curve were 0.9771 and 0.9696, which was better than other ensemble classifiers and comparable with that of SVM.

摘要

我们使用氨基酸组成分布(AACD)来区分嗜热蛋白和嗜冷蛋白。我们采用10折交叉验证并使用其他数据集进行独立测试来评估模型。结果表明,当片段长度为1时,总体准确率分别达到92.9%和90.2%。当使用支持向量机作为分类器时,AACD方法提高了预测准确率。当引入六个新特征时,随机森林的总体准确率提高到93.2%和92.2%,接收器操作特征曲线下的面积分别为0.9771和0.9696,这优于其他集成分类器,且与支持向量机相当。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索