School of Mathematics, Physics and Biological Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China.
J Theor Biol. 2012 Jan 21;293:49-54. doi: 10.1016/j.jtbi.2011.10.004. Epub 2011 Oct 12.
Meiotic recombination does not occur randomly across the genome, but instead occurs at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Hotspots and coldspots would shed light on the mechanism of recombination, but the accurate prediction of hot/cold spots is still an open question. In this study, we presented a model to predict hot/cold spots in yeast using increment of diversity combined with quadratic discriminant analysis (IDQD) based on sequence k-mer frequencies. 5-fold cross validation showed a total prediction accuracy of 80.3%. Compared with other machine-learning algorithms, IDQD approach is as powerful as random forest (RF) and outperforms support vector machine (SVM) in identifying hotspots and coldspots. We also predicted increased recombination rates in the upstream regions of transcription start sites and in the downstream regions of transcription termination sites. Additionally, genome-wide recombination map in yeast obtained by IDQD model is in close agreement with the experimentally generated map, especially for the Peak locations, although some fine-scale differences exist. Our results highlight the sequence dependency of recombination.
减数分裂重组并非在整个基因组中随机发生,而是在一些基因组区域(热点)中以相对较高的频率发生,而在其他区域(冷点)中以相对较低的频率发生。热点和冷点将揭示重组的机制,但准确预测热点/冷点仍然是一个悬而未决的问题。在这项研究中,我们提出了一种使用多样性增量结合二次判别分析 (IDQD) 基于序列 k-mer 频率来预测酵母中热点/冷点的模型。5 倍交叉验证显示总预测准确率为 80.3%。与其他机器学习算法相比,IDQD 方法与随机森林 (RF) 一样强大,在识别热点和冷点方面优于支持向量机 (SVM)。我们还预测了转录起始位点上游区域和转录终止位点下游区域的重组率增加。此外,通过 IDQD 模型获得的酵母全基因组重组图谱与实验生成的图谱非常吻合,尤其是峰位置,尽管存在一些细微的差异。我们的结果强调了重组的序列依赖性。