Zhou Tong, Weng Jianhong, Sun Xiao, Lu Zuhong
State Key Laboratory of Bioelectronics, Southeast University, Nanjing 210096, China.
BMC Bioinformatics. 2006 Apr 26;7:223. doi: 10.1186/1471-2105-7-223.
Meiotic double-strand breaks occur at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Hotspots and coldspots are receiving increasing attention in research into the mechanism of meiotic recombination. However, predicting hotspots and coldspots from DNA sequence information is still a challenging task.
We present a novel method for classification of hot and cold ORFs located in hotspots and coldspots respectively in Saccharomyces cerevisiae, using support vector machine (SVM), which relies on codon composition differences. This method has achieved a high classification accuracy of 85.0%. Since codon composition is a fusion of codon usage bias and amino acid composition signals, the ability of these two kinds of sequence attributes to discriminate hot ORFs from cold ORFs was also investigated separately. Our results indicate that neither codon usage bias nor amino acid composition taken separately performed as well as codon composition. Moreover, our SVM based method was applied to the full genome: We predicted the hot/cold ORFs from the yeast genome by using cutoffs of recombination rate. We found that the performance of our method for predicting cold ORFs is not as good as that for predicting hot ORFs. Besides, we also observed a considerable correlation between meiotic recombination rate and amino acid composition of certain residues, which probably reflects the structural and functional dissimilarity between the hot and cold groups.
We have introduced a SVM-based novel method to discriminate hot ORFs from cold ones. Applying codon composition as sequence attributes, we have achieved a high classification accuracy, which suggests that codon composition has strong potential to be used as sequence attributes in the prediction of hot and cold ORFs.
减数分裂双链断裂在某些基因组区域(热点)发生频率相对较高,而在其他区域(冷点)发生频率相对较低。热点和冷点在减数分裂重组机制的研究中受到越来越多的关注。然而,从DNA序列信息预测热点和冷点仍然是一项具有挑战性的任务。
我们提出了一种新方法,利用支持向量机(SVM)对酿酒酵母中分别位于热点和冷点的热ORF和冷ORF进行分类,该方法依赖于密码子组成差异。此方法已达到85.0%的高分类准确率。由于密码子组成是密码子使用偏好和氨基酸组成信号的融合,还分别研究了这两种序列属性区分热ORF和冷ORF的能力。我们的结果表明,单独的密码子使用偏好或氨基酸组成都不如密码子组成表现得好。此外,我们基于支持向量机的方法应用于全基因组:我们通过使用重组率阈值从酵母基因组中预测热/冷ORF。我们发现我们的方法预测冷ORF的性能不如预测热ORF的性能。此外,我们还观察到减数分裂重组率与某些残基的氨基酸组成之间存在相当大的相关性,这可能反映了热组和冷组之间的结构和功能差异。
我们引入了一种基于支持向量机的新方法来区分热ORF和冷ORF。将密码子组成作为序列属性,我们取得了较高的分类准确率,这表明密码子组成在预测热ORF和冷ORF方面具有很强的潜力用作序列属性。