Zhang Guangya, Li Hongchun, Fang Baishan
Institute of Industrial Biotechnology, Huaqiao University, Quanzhou 362021, China.
Sheng Wu Gong Cheng Xue Bao. 2008 Aug;24(8):1439-45.
Types of cofactor independency for newly found oxidoreductases sequences are usually determined by experimental analysis. These experimental methods are both time-consuming and costly. With the explosion of oxidoreductases sequences entering into the databanks, it is highly desirable to explore the feasibility of selectively classifying newly found oxidoreductases into their respective cofactor independency classes by means of an automated method. In this study, we proposed a modified Chou's pseudo-amino acid composition method to extract features from sequences and the k-nearest neighbor was used as the classifier, and the results were very encouraging. When lambda = 48, w = 0.1, the areas under the ROC curve of k-nearest neighbor in 10-fold cross-validation was 0.9536; and the success rate was 92.0%, which was 3.5% higher than that of pseudo-amino acid composition. It was also better than all the other 7 feature extraction methods. Our results showed that predicting the cofactors of oxidoreductases was feasible and the modified pseudo-amino acid composition method may be a useful method for extracting features from protein sequences.
新发现的氧化还原酶序列的辅因子独立性类型通常通过实验分析来确定。这些实验方法既耗时又昂贵。随着进入数据库的氧化还原酶序列激增,非常有必要探索通过自动化方法将新发现的氧化还原酶选择性分类到各自辅因子独立性类别的可行性。在本研究中,我们提出了一种改进的周氏伪氨基酸组成方法来从序列中提取特征,并使用k近邻作为分类器,结果非常令人鼓舞。当λ = 48,w = 0.1时,k近邻在10折交叉验证中的ROC曲线下面积为0.9536;成功率为92.0%,比伪氨基酸组成方法高3.5%。它也优于其他7种特征提取方法。我们的结果表明,预测氧化还原酶的辅因子是可行的,改进的伪氨基酸组成方法可能是一种从蛋白质序列中提取特征的有用方法。