Zhang Guang-Ya, Fang Bai-Shan
Institute of Industrial Biotechnology, Huaqiao University, Quanzhou 362021, China.
Sheng Wu Gong Cheng Xue Bao. 2006 Nov;22(6):1026-31.
In this paper, the Boosting-based decision tree ensemble classifiers were applied to discriminate thermophilic and mesophilic proteins. Three methods, namely, self-consistency test, 5-fold cross-validation and independent testing with other dataset, were used to evaluate the performance and robust of the models. Logitboost, as a novel classifier in Boosting algorithm, performed better than Adaboost. The overall accuracy of the three methods was 100%, 88.4% and 89.5%, respectively. It was demonstrated that LogitBoost performed comparably or even better than that of neural network, a very powerful classifier widely used in biological literatures. The influence of protein size on discrimination was addressed. It is anticipated that the power in predicting many bio-macromolecular attributes will be further strengthened if the Boosting and some other existing algorithms can be effectively complemented with each other.
在本文中,基于提升算法的决策树集成分类器被用于鉴别嗜热蛋白和嗜温蛋白。使用了三种方法,即自一致性检验、五折交叉验证以及用其他数据集进行独立测试,来评估模型的性能和稳健性。Logitboost作为提升算法中的一种新型分类器,表现优于Adaboost。这三种方法的总体准确率分别为100%、88.4%和89.5%。结果表明,LogitBoost的性能与神经网络相当,甚至更好,神经网络是生物文献中广泛使用的一种非常强大的分类器。文中探讨了蛋白质大小对鉴别的影响。预计如果提升算法和其他一些现有算法能够有效互补,那么在预测许多生物大分子属性方面的能力将得到进一步加强。