School of Physical Science and Technology, Inner Mongolia University, Hohhot, China.
Genomics. 2011 Nov;98(5):359-66. doi: 10.1016/j.ygeno.2011.07.008. Epub 2011 Aug 2.
Knowledge of the detailed organization of nucleosomes across genomes and the mechanisms of nucleosome positioning is critical for the understanding of gene regulation and expression. In the present work, the bias of 4-mer frequency in nucleosome and linker sequences of the S. cerevisiae genome was analyzed statistically. A novel position-correlation scoring function algorithm based on the bias of 4-mer frequency in linker sequences was presented to distinguish nucleosome vs linker sequences. Five-fold cross-validation demonstrated that the algorithm achieved a good performance with mean area under the receiver operator characteristics curve of 0.981. Next, the algorithm was used to predict nucleosome occupancy throughout the S. cerevisiae genome and relatively high correlation coefficients with experiment maps of nucleosome positioning were obtained. Besides, the distinct nucleosome depleted regions in the vicinity of regulatory sites were confirmed. The results suggest that intrinsic DNA sequence preferences in linker regions have a significant impact on the nucleosome occupancy.
了解基因组中核小体的详细组织和核小体定位的机制对于理解基因调控和表达至关重要。在本工作中,我们从统计学角度分析了酿酒酵母基因组中核小体和连接子序列的四联体频率偏倚。提出了一种基于连接子序列中四联体频率偏倚的新型位置相关评分函数算法,用于区分核小体和连接子序列。五重交叉验证表明,该算法具有良好的性能,接收者操作特征曲线下的平均面积为 0.981。然后,我们使用该算法预测酿酒酵母基因组中的核小体占有率,与核小体定位实验图谱获得了较高的相关系数。此外,还证实了在调控位点附近存在明显的核小体缺失区域。结果表明,连接子区域中固有的 DNA 序列偏好对核小体占有率有显著影响。