Xuan P, Guo M Z, Wang J, Wang C Y, Liu X Y, Liu Y
School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, PR China.
Genet Mol Res. 2011 Apr 12;10(2):588-603. doi: 10.4238/vol10-2gmr969.
In order to classify the real/pseudo human precursor microRNA (pre-miRNAs) hairpins with ab initio methods, numerous features are extracted from the primary sequence and second structure of pre-miRNAs. However, they include some redundant and useless features. It is essential to select the most representative feature subset; this contributes to improving the classification accuracy. We propose a novel feature selection method based on a genetic algorithm, according to the characteristics of human pre-miRNAs. The information gain of a feature, the feature conservation relative to stem parts of pre-miRNA, and the redundancy among features are all considered. Feature conservation was introduced for the first time. Experimental results were validated by cross-validation using datasets composed of human real/pseudo pre-miRNAs. Compared with microPred, our classifier miPredGA, achieved more reliable sensitivity and specificity. The accuracy was improved nearly 12%. The feature selection algorithm is useful for constructing more efficient classifiers for identification of real human pre-miRNAs from pseudo hairpins.
为了用从头开始的方法对真实/伪人类前体微小RNA(pre-miRNA)发夹进行分类,从pre-miRNA的一级序列和二级结构中提取了许多特征。然而,它们包含一些冗余和无用的特征。选择最具代表性的特征子集至关重要;这有助于提高分类准确性。根据人类pre-miRNA的特征,我们提出了一种基于遗传算法的新颖特征选择方法。考虑了特征的信息增益、相对于pre-miRNA茎部的特征保守性以及特征之间的冗余性。特征保守性是首次引入。使用由人类真实/伪pre-miRNA组成的数据集通过交叉验证对实验结果进行了验证。与microPred相比,我们的分类器miPredGA实现了更可靠的敏感性和特异性。准确率提高了近12%。该特征选择算法对于构建更有效的分类器以从伪发夹中识别真实人类pre-miRNA很有用。