Zhang Guo-qing, Cao Zhi-wei, Luo Qing-ming, Cai Yu-dong, Li Yi-xue
Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.
Comput Biol Chem. 2006 Jun;30(3):233-40. doi: 10.1016/j.compbiolchem.2006.03.002. Epub 2006 May 23.
The operon is a specific functional organization of genes found in bacterial genomes. Most genes within operons share common features. The support vector machine (SVM) approach is here used to predict operons at the genomic level. Four features were chosen as SVM input vectors: the intergenic distances, the number of common pathways, the number of conserved gene pairs and the mutual information of phylogenetic profiles. The analysis reveals that these common properties are indeed characteristic of the genes within operons and are different from that of non-operonic genes. Jackknife testing indicates that these input feature vectors, employed with RBF kernel SVM, achieve high accuracy. To validate the method, Escherichia coli K12 and Bacillus subtilis were taken as benchmark genomes of known operon structure, and the prediction results in both show that the SVM can detect operon genes in target genomes efficiently and offers a satisfactory balance between sensitivity and specificity.
操纵子是细菌基因组中发现的基因的一种特定功能组织。操纵子内的大多数基因具有共同特征。本文采用支持向量机(SVM)方法在基因组水平上预测操纵子。选择了四个特征作为支持向量机的输入向量:基因间距离、共同途径的数量、保守基因对的数量以及系统发育谱的互信息。分析表明,这些共同特性确实是操纵子内基因的特征,与非操纵子基因的特征不同。留一法测试表明,这些输入特征向量与径向基核支持向量机一起使用时,具有很高的准确性。为了验证该方法,以大肠杆菌K12和枯草芽孢杆菌作为已知操纵子结构的基准基因组,两者的预测结果均表明,支持向量机能够有效地检测目标基因组中的操纵子基因,并在敏感性和特异性之间提供了令人满意的平衡。