Dokuz Eylül University, Faculty of Arts and Sciences, Department of Statistics, 35160 Kaynaklar Campus, Izmir, Turkey.
Comput Biol Chem. 2009 Dec;33(6):461-4. doi: 10.1016/j.compbiolchem.2009.09.002. Epub 2009 Sep 28.
Nearly all enzymes are proteins. They are the biological catalysts that accelerate the function of cellular reactions. Because of different characteristics of reaction tasks, they split into six classes: oxidoreductases (EC-1), transferases (EC-2), hydrolases (EC-3), lyases (EC-4), isomerases (EC-5), ligases (EC-6). Prediction of enzyme classes is of great importance in identifying which enzyme class is a member of a protein. Since the enzyme sequences increase day by day, contrary to experimental analysis in prediction of enzyme classes for a newly found enzyme sequence, providing from data mining techniques becomes very useful and time-saving. In this paper, two kinds of simple minimum distance-based classifier methods have been proposed. These methods and known K-nearest neighbor (KNN) classification algorithm have been performed in order to classify enzymes according to their amino acid composition. Performance measurements and elapsed time to execute algorithms have been compared. In addition, equality of two proposed approaches under special condition has been proved in order to be a guide for researchers.
几乎所有的酶都是蛋白质。它们是加速细胞反应功能的生物催化剂。由于反应任务的不同特点,它们分为六类:氧化还原酶(EC-1)、转移酶(EC-2)、水解酶(EC-3)、裂解酶(EC-4)、异构酶(EC-5)、连接酶(EC-6)。预测酶类对于识别蛋白质所属的酶类具有重要意义。由于酶序列的数量与日俱增,与针对新发现的酶序列进行实验分析相比,从数据挖掘技术中获取信息变得非常有用且节省时间。本文提出了两种基于简单最小距离的分类器方法。为了根据氨基酸组成对酶进行分类,将这些方法与已知的 K-最近邻(KNN)分类算法进行了比较。比较了性能测量和算法执行时间。此外,为了指导研究人员,还证明了在特殊条件下两种方法的相等性。